When parsing a bulk operation JSONL file with nested items from top to bottom line by line, when I reach a new top level parent object, does that mean I’ve gone through all children of the previous parent?
Context
When processing a bulk operation JSONL file, I do some processing that requires having a parent and all of their children. I’d like to keep my memory requirements as small as possible, so I need to know when I’m done processing an object and all of its children.
Example for clarification
Using the documentation page’s JSONL example:
{"id":"gid://shopify/Product/1921569226808"}
{"id":"gid://shopify/ProductVariant/19435458986123","title":"52","__parentId":"gid://shopify/Product/1921569226808"}
{"id":"gid://shopify/ProductVariant/19435458986040","title":"70","__parentId":"gid://shopify/Product/1921569226808"}
{"id":"gid://shopify/Product/1921569259576"}
{"id":"gid://shopify/ProductVariant/19435459018808","title":"34","__parentId":"gid://shopify/Product/1921569259576"}
{"id":"gid://shopify/Product/1921569292344"}
{"id":"gid://shopify/ProductVariant/19435459051576","title":"Default Title","__parentId":"gid://shopify/Product/1921569292344"}
{"id":"gid://shopify/Product/1921569325112"}
{"id":"gid://shopify/ProductVariant/19435459084344","title":"36","__parentId":"gid://shopify/Product/1921569325112"}
{"id":"gid://shopify/Product/1921569357880"}
{"id":"gid://shopify/ProductVariant/19435459117112","title":"47","__parentId":"gid://shopify/Product/1921569357880"}
If I’m reading the file line by line from top to bottom and I hit Product with id gid://shopify/Product/1921569259576
on line 4, does this mean that I’ve already seen all of the previous product’s (gid://shopify/Product/1921569226808
) product variants the JSONL file contains?
2
Answers
What a lot of people do is shell out and use tac to reverse the file. Then when you parse the file you end up nice processing the children first and then knowing when you hit the parent, you have everything and you can move.
Obviously this is nicer than getting the parent and then the children, and then wondering, have I hit all the children or are there more.
Try it! It works!
My pseudo code (which you can convert to whatever scripting language you want looks like this:
You can safely assume that the children of a parent will be directly under it. The documentation is indeed vague on this:
But that refers to the fact that parents can appear in a different order and that the children can have an inconsistent order. For example, if you have 10 variants, they can appear in a different order but strictly under their parent.
If the Bulk API did not give any guarantees regarding the children it would have been impossible to read large files because expecting a child to appear anywhere within a file would mean that you need to keep all parsed objects in memory until you read the last line.
Hope that makes sense.