I have a requirement to store the content of proto to mongoDB for which I am converting proto to json string and then converting to Document object.
Some time the proto will have so much data upto 14mb which makes the above conversion takes around 5 seconds, which is causing too much of latency in the code. Is their a better way to handle this ?
MongoDB Bson has the limit of 16 mb but this conversion comes with latency.
private static Document serialize(AggregationProto aggregationProto) {
try {
String json = JsonFormat.printer().print(aggregationProto);
return Document.parse(json);
} catch (InvalidProtocolBufferException e) {
throw new IllegalStateException("Failed to print aggregationProto as JSON", e);
}
Above is the method used to convert from proto to MongoDB Document.
Overall I am looking for something like below:
(proto <=> bson instead of proto <=> json <=> bson)
Reference: https://stackoverflow.com/a/52586997/3301316
Anything updated after this: https://github.com/protocolbuffers/protobuf/issues/2601
2
Answers
proto→json→bson conversion was the bottle-neck and no new ways on bypassing json conversions was available from google-proto.
MongoDB supports Binary Data under BSON types, where one can store proto directly in monogo in byte[] format.
Apache commons provided utils can be used for this proto<->byte[] conversions.
This helps in saving conversion costs.
Converting between two binary formats by going through a text format is always going to be relatively slow. Google stated that they won’t support BSON directly, so you’d need to go through the reflection API (similar to JsonFormat, but for BSON) or do code generation via a custom plugin.
Reflection API
The reflection API lets you access field values and descriptors of arbitrary messages. It’s slower than code generation, but it’s usually easier to implement, e.g.,
The JsonFormat is implemented this way to convert between Protobuf-Java and JSON.
Protoc plugin
The protobuf compiler
protoc
has a plugin mechanism for arbitrary code generation. It is typically used for 3rd party protobuf implementations, but it could also be used to generate conversion code to go from Protobuf-Java objects to BSON, e.g.,There are various tutorials explaining howto implement protobuf plugins as well as sample projects like python-protoc-plugin that provide a barebones skeleton to get started.
Setting up a Java plugin is somewhat painful because you need wrapper scripts or a native image, but if you prefer Java you can take a look at QuickBuffers and the generated code for
writeTo(JsonSink)
. (Disclaimer: I’m the author. You could also just try using the library and check whether the faster json serialization provides enough of a speedup for your use case).