skip to Main Content

I am trying to split 1 NDJSON file into multiple NDJSON files. I am able to consume and split the file, but the problem is the resulting files are in JSON format. Is it possible to output to NDJSON format or do I have to do some string manipulation?

My input file test.json:

{"PERIOD":"2024004","JRNAL_NO":"38115"}
{"PERIOD":"2024004","JRNAL_NO":"38115"}
{"PERIOD":"2024004","JRNAL_NO":"38115"}
{"PERIOD":"2024004","JRNAL_NO":"38115"}
{"PERIOD":"2024004","JRNAL_NO":"38116"}

My powershell script so far:

$json = (Get-Content C:Temptest.json) | ConvertFrom-Json

$jnl_list = $json.JRNAL_NO | select -Unique

ForEach ($jnl in $jnl_list) {
    $Array = $json | Where-Object {$_.JRNAL_NO -eq $jnl}
    $res = ($Array | ConvertTo-Json)    
    $res | Out-File -FilePath .JNL$($jnl).json   
}

My current output. Here’s the 38115.json file:

[
    {
        "PERIOD":  "2024004",
        "JRNAL_NO":  "38115"
    },
    {
        "PERIOD":  "2024004",
        "JRNAL_NO":  "38115"
    },
    {
        "PERIOD":  "2024004",
        "JRNAL_NO":  "38115"
    },
    {
        "PERIOD":  "2024004",
        "JRNAL_NO":  "38115"
    }
]

I need the output file to be NDJSON, basically the same format as the input file.
38115.json should be:

{"PERIOD":"2024004","JRNAL_NO":"38115"}
{"PERIOD":"2024004","JRNAL_NO":"38115"}
{"PERIOD":"2024004","JRNAL_NO":"38115"}
{"PERIOD":"2024004","JRNAL_NO":"38115"}

38116.json should be:

{"PERIOD":"2024004","JRNAL_NO":"38116"}

2

Answers


  1. If I understand correctly what you’re looking for, the code my be greatly simplified by using Group-Object.

    To get the NDJSON format what you can do is enumerate each object from the group of objects (.Group property) and pass it to ConvertTo-Json -Compress then send that output to your file.

    $json = Get-Content C:Temptest.json | ConvertFrom-Json
    foreach ($group in $json | Group-Object JRNAL_NO) {
        $group.Group |
            ForEach-Object { $_ | ConvertTo-Json -Compress } |
            Set-Content ".JNL$($group.Name).json"
    }
    

    Code above with the sample data would be creating 2 files:

    • 38115.json
    {"PERIOD":"2024004","JRNAL_NO":"38115"}
    {"PERIOD":"2024004","JRNAL_NO":"38115"}
    {"PERIOD":"2024004","JRNAL_NO":"38115"}
    {"PERIOD":"2024004","JRNAL_NO":"38115"}
    
    • 38116.json
    {"PERIOD":"2024004","JRNAL_NO":"38116"}
    
    Login or Signup to reply.
  2. To complement the helpful answer from Santiago Squarzon with a presumably more performant solution:
    You could also consider to distribute the original lines(/Json expressions) one-by-one on the fly using the steppable pipeline as in this example where you only use the ConvertTo-Json result to determine the "JRNAL_NO" value:

    $Pipeline = @{}
    Get-Content .test.json |
        ForEach-Object -Process {
            $Jnl = ($_ | ConvertFrom-Json).JRNAL_NO
            if (!$Pipeline.Contains($Jnl)) {
                $Pipeline[$Jnl] = { Set-Content .$Jnl.json }.GetSteppablePipeline()
                $Pipeline[$Jnl].Begin($True)
            }
            $Pipeline[$Jnl].Process($_)
        } -End {
            foreach ($Key in $Pipeline.Keys) { $Pipeline[$Key].End() }
        }
    

    For more details and background, see: Mastering the (steppable) pipeline

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search