skip to Main Content

I have the following struct that I need to encode into JSON and output to an io.Writer object:

type Data struct {
    FieldA string    `json:"field_a"`
    FieldB int       `json:"field_b"`
    Rows   io.Reader `json:"rows"`
}

Rows is an io.Reader object that will return a potentially very large JSON array in binary format. I want to avoid loading the whole result from the reader into memory, as that introduces latency and high memory overhead. Rows is guaranteed to be valid JSON, so it doesn’t have to be decoded and then reencoded into JSON, it can just be passed on as is.

My problem is that the json package from the standard library doesn’t support defining a streaming friendly implementation of MarshalJSON, it expects you to write your result into a []byte buffer and return it.

Using json.NewEncoder(writer).Encode(Data) is pretty much what I need, but I cannot define custom behavior for the reader and it just returns {} for it.

Is there a way to achieve this without a completely custom implementation of the json encoding process?

2

Answers


  1. Chosen as BEST ANSWER

    I settled on a solution similar to the other answer, but which doesn't require manually serializing every other field.

    I defined data like this:

    type Data struct {
        FieldA string    `json:"field_a"`
        FieldB int       `json:"field_b"`
        Rows   io.Reader `json:"-"`
    }
    

    The - in the json tag for the Rows field indicates that go should always skip encoding that field into the json output. The encode function then looks like this:

    func encode(d Data, writer io.Writer) error {
        buf, err := json.Marshal(d)
        if err != nil {
            return err
        }
    
        _, err = writer.Write(buf[:(len(buf) - 1)])
        if err != nil {
            return err
        }
        _, err = writer.Write([]byte(","rows":"))
        if err != nil {
            return err
        }
        _, err = io.Copy(writer, req.Rows)
        if err != nil {
            return err
        }
        _, err = writer.Write([]byte("}"))
        if err != nil {
            return err
        }
    
        return nil
    }
    
    1. I serialize the object as-is into a buffer (but without the rows)
    2. Write to the output everything except the final closing brace
    3. Write to the output ,"rows": to indicate a new field
    4. Copy the rows reader to the output writer
    5. Write a final } to the output

    It works pretty well and an end-to-end PoC with a gin web server uses about 35MB of memory to get a reader for an object from S3 that represents the rows, decompress it using zstd, and serialize it directly into the gin output writer. Even works much faster than just doing the whole thing in memory, since it can start returning data immediately as it doesn't have to wait for the whole thing to be decoded into memory and then re-encode it.

    The full PoC can be found here if anybody's interested: https://github.com/TheEdgeOfRage/streaming-poc


  2. There is not a way to achieve this using the standard encoding/json package. A custom JSON encoder for Data is not onerous.

    func encode(d *Data, w io.Writer) error {
        field := func(name string, value any) {
            b, _ := json.Marshal(name)
            w.Write(b)
            io.WriteString(w, ": ")
            b, _ = json.Marshal(value)
            w.Write(b)
            io.WriteString(w, ", ")
        }
    
        io.WriteString(w, "{ ")
        field("field_a", d.FieldA)
        field("field_b", d.FieldB)
        io.WriteString(w, `"rows": `)
        _, err := io.Copy(w, d.Rows)
        if err != nil {
            return err
        }
        io.WriteString(w, "}n")
        return nil
    }
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search