skip to Main Content

When parsing a bulk operation JSONL file with nested items from top to bottom line by line, when I reach a new top level parent object, does that mean I’ve gone through all children of the previous parent?

Context

When processing a bulk operation JSONL file, I do some processing that requires having a parent and all of their children. I’d like to keep my memory requirements as small as possible, so I need to know when I’m done processing an object and all of its children.

Example for clarification

Using the documentation page’s JSONL example:

{"id":"gid://shopify/Product/1921569226808"}
{"id":"gid://shopify/ProductVariant/19435458986123","title":"52","__parentId":"gid://shopify/Product/1921569226808"}
{"id":"gid://shopify/ProductVariant/19435458986040","title":"70","__parentId":"gid://shopify/Product/1921569226808"}
{"id":"gid://shopify/Product/1921569259576"}
{"id":"gid://shopify/ProductVariant/19435459018808","title":"34","__parentId":"gid://shopify/Product/1921569259576"}
{"id":"gid://shopify/Product/1921569292344"}
{"id":"gid://shopify/ProductVariant/19435459051576","title":"Default Title","__parentId":"gid://shopify/Product/1921569292344"}
{"id":"gid://shopify/Product/1921569325112"}
{"id":"gid://shopify/ProductVariant/19435459084344","title":"36","__parentId":"gid://shopify/Product/1921569325112"}
{"id":"gid://shopify/Product/1921569357880"}
{"id":"gid://shopify/ProductVariant/19435459117112","title":"47","__parentId":"gid://shopify/Product/1921569357880"}

If I’m reading the file line by line from top to bottom and I hit Product with id gid://shopify/Product/1921569259576 on line 4, does this mean that I’ve already seen all of the previous product’s (gid://shopify/Product/1921569226808) product variants the JSONL file contains?

2

Answers


  1. What a lot of people do is shell out and use tac to reverse the file. Then when you parse the file you end up nice processing the children first and then knowing when you hit the parent, you have everything and you can move.

    Obviously this is nicer than getting the parent and then the children, and then wondering, have I hit all the children or are there more.

    Try it! It works!

    My pseudo code (which you can convert to whatever scripting language you want looks like this:

    inventory_file = Tempfile.new
    inventory_file.binmode
    uri = URI(result.data.node.url)
    IO.copy_stream(uri.open, inventory_file) # store large amount of JSON Lines data in a tempfile
    inventory_file.rewind # move from EOF to beginning of file
    y = "#{Rails.root}/tmp/#{shop_domain}.reversed.jsonl"
    `tac #{inventory_file.path} > #{y}`
    puts "Completed Reversal of file using tac, so now we can quickly iterate the inventory"
    f = File.foreach(y)
    variants = {}
    f.each_entry do |line|
      data = JSON.parse(line)
      # play with my data
    end
    Login or Signup to reply.
  2. You can safely assume that the children of a parent will be directly under it. The documentation is indeed vague on this:

    The GraphQL Admin API doesn’t serially process the contents of the JSONL file. Avoid relying on a particular sequence of lines and object order to achieve a desired result.

    But that refers to the fact that parents can appear in a different order and that the children can have an inconsistent order. For example, if you have 10 variants, they can appear in a different order but strictly under their parent.

    If the Bulk API did not give any guarantees regarding the children it would have been impossible to read large files because expecting a child to appear anywhere within a file would mean that you need to keep all parsed objects in memory until you read the last line.

    Hope that makes sense.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search