skip to Main Content

From the Shopify API, I receive a link to a large amount of JSONL. Using NodeJS, I need to read this data line-by-line, as loading it all at once would use lots of memory. When I hit the JSONL url from the web browser, it automatically downloads the JSONL file to my downloads folder.

Example of JSONL:

{"id":"gid://shopify/Customer/6478758936817","firstName":"Joe"}
{"id":"gid://shopify/Order/5044232028401","name":"#1001","createdAt":"2022-09-16T16:30:50Z","__parentId":"gid://shopify/Customer/6478758936817"}
{"id":"gid://shopify/Order/5044244480241","name":"#1003","createdAt":"2022-09-16T16:37:27Z","__parentId":"gid://shopify/Customer/6478758936817"}
{"id":"gid://shopify/Order/5057425703153","name":"#1006","createdAt":"2022-09-27T17:24:39Z","__parentId":"gid://shopify/Customer/6478758936817"}
{"id":"gid://shopify/Customer/6478771093745","firstName":"John"}
{"id":"gid://shopify/Customer/6478771126513","firstName":"Jane"}

I’m unsure how to process this data in NodeJS. Do I need to hit the url, download all of the data and store it in a temporary file, then process the data line-by-line? Or can I read the data line-by-line directly after hitting the url (via some sort of stream?) and process it without storing it in a temporary file on the server?

(The JSONL comes from https://storage.googleapis.com/ if that helps.)

Thanks.

2

Answers


  1. using axios you can set the response to be a stream, and then using a buildin readline module, you can process your data line by line.

    import axios from 'axios'
    import { createInterface } from 'node:readline'
    
    const response = await axios.get('https://raw.githubusercontent.com/zaibacu/thesaurus/master/en_thesaurus.jsonl', {
      responseType: 'stream'
    })
    
    const rl = createInterface({
      input: response.data
    })
    
    for await (const line of rl) {
      // do something with the current line
      const { word, synonyms } = JSON.parse(line)
      console.log('word, synonyms: ', word, synonyms);
    }
    

    testing this there is barely any memory usage

    Login or Signup to reply.
  2. You can easily run a great CLI tool called jq. Magic.

    Unlike tying yourself to browser code, this code can be run in any way you need to parse JSONL.

       jq -cs '.' doodoo.myshopify.com.export.jsonl > out.json
    

    Would take my nicely just downloaded bulk file from a query and give me a very nice pure JSON data structure to play with, or save.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search