skip to Main Content

I have a requirement to store the content of proto to mongoDB for which I am converting proto to json string and then converting to Document object.

Some time the proto will have so much data upto 14mb which makes the above conversion takes around 5 seconds, which is causing too much of latency in the code. Is their a better way to handle this ?

MongoDB Bson has the limit of 16 mb but this conversion comes with latency.

  private static Document serialize(AggregationProto aggregationProto) {
    try {
      String json = JsonFormat.printer().print(aggregationProto);
      return Document.parse(json);
    } catch (InvalidProtocolBufferException e) {
      throw new IllegalStateException("Failed to print aggregationProto as JSON", e);
    }

Above is the method used to convert from proto to MongoDB Document.

Overall I am looking for something like below:

(proto <=> bson instead of proto <=> json <=> bson)

Reference: https://stackoverflow.com/a/52586997/3301316

Anything updated after this: https://github.com/protocolbuffers/protobuf/issues/2601

2

Answers


  1. Chosen as BEST ANSWER

    proto→json→bson conversion was the bottle-neck and no new ways on bypassing json conversions was available from google-proto.

    MongoDB supports Binary Data under BSON types, where one can store proto directly in monogo in byte[] format.

    Apache commons provided utils can be used for this proto<->byte[] conversions.

      private static Bson serialize(AggregationSkdPnr aggregationSkdPnr) {
        byte[] serialize = SerializationUtils.serialize(aggregationSkdPnr);
        return Updates.combine(
                set("value", serialize)
        );
      }
    
      private static AggregationSkdPnr deserialize(Document document) {
        return document.get("value").getData();
     }
    

    This helps in saving conversion costs.


  2. Converting between two binary formats by going through a text format is always going to be relatively slow. Google stated that they won’t support BSON directly, so you’d need to go through the reflection API (similar to JsonFormat, but for BSON) or do code generation via a custom plugin.

    Reflection API

    The reflection API lets you access field values and descriptors of arbitrary messages. It’s slower than code generation, but it’s usually easier to implement, e.g.,

    msg.getAllFields().forEach((field, value) -> {
        System.out.println(field.getJsonName() + " = " + value);
    });
    

    The JsonFormat is implemented this way to convert between Protobuf-Java and JSON.

    Protoc plugin

    The protobuf compiler protoc has a plugin mechanism for arbitrary code generation. It is typically used for 3rd party protobuf implementations, but it could also be used to generate conversion code to go from Protobuf-Java objects to BSON, e.g.,

    static convertProtoToBson(ProtoMsg proto, BsonOutput bson) {
        if (proto.hasSomeInt()) {
            bson.writeInt32(proto.getSomeInt());
        }
    }
    

    There are various tutorials explaining howto implement protobuf plugins as well as sample projects like python-protoc-plugin that provide a barebones skeleton to get started.

    Setting up a Java plugin is somewhat painful because you need wrapper scripts or a native image, but if you prefer Java you can take a look at QuickBuffers and the generated code for writeTo(JsonSink). (Disclaimer: I’m the author. You could also just try using the library and check whether the faster json serialization provides enough of a speedup for your use case).

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search