skip to Main Content

I need a JOLT spec to do the below transformation in an Apache NiFi JOLT processor. So there are arrays that are all the same length (in this case 3), and so I want to expand it into 3 records with the non array values repeated. The input data all has the same labels of "text" and so I want to keep it that way after the transform as shown in the expected output.

Input Json:

[
  [
    {
      "text": "AAA"
    }
  ],
  [
    {
      "text": "BBB"
    }
  ],
  [
    {
      "text": "11"
    },
    {
      "text": "12"
    },
    {
      "text": "13"
    }
  ],
  [
    {
      "text": "A1"
    },
    {
      "text": "B2"
    },
    {
      "text": "C3"
    }
  ],
  [
    {
      "text": "Z"
    }
  ]
]

Expected output:

[
    [
        [
            {
                "text": "AAA"
            }
        ],
        [
            {
                "text": "BBB"
            }
        ],
        [
            {
                "text": "11"
            }
        ],
        [
            {
                "text": "A1"
            }
        ],
        [
            {
                "text": "Z"
            }
        ]
    ],
    [
        [
            {
                "text": "AAA"
            }
        ],
        [
            {
                "text": "BBB"
            }
        ],
        [
            {
                "text": "12"
            }
        ],
        [
            {
                "text": "B2"
            }
        ],
        [
            {
                "text": "Z"
            }
        ]
    ],
    [
        [
            {
                "text": "AAA"
            }
        ],
        [
            {
                "text": "BBB"
            }
        ],
        [
            {
                "text": "13"
            }
        ],
        [
            {
                "text": "C3"
            }
        ],
        [
            {
                "text": "Z"
            }
        ]
    ]
]

2

Answers


  1. Chosen as BEST ANSWER

    So, I created this question as being JOLT specific for the needed solution, but in reality I really should have phrased the question more generically as that I needed the JSON transformed somehow using NiFi.

    I was pointed to JSLT as another transform NiFi processor that I was unaware of, but I was not successful in that either. The main problem for both of those attempts, was that all of the JSON key/value pairs all have the same key name of "text" and I could not get transform definitions to handle that. This is how my source comes in and so I needed a way to handle that.

    In the end, I created used a Groovy script processor that uses Regex matching loops to identify the size that is common to all of the arrays, extract the array values while repeating the non array values, and generate the needed records based off of that. Below is the script:

    import org.apache.nifi.processor.io.StreamCallback
    import org.apache.commons.io.IOUtils
    import java.nio.charset.*
    
    def flowFile = session.get()
    if(!flowFile) return
    
    flowFile = session.write(flowFile, {inputStream, outputStream ->
    
    def ffcontent = IOUtils.toString(inputStream, StandardCharsets.UTF_8);
       
    def rcrd = ""
    
    //First find if there are arrays
    def p = /[{"text":[^[]*},{[^[]*}]/
    def m = ffcontent =~ p
    
    if(m.count>0) //m.count is number of arrays - if it is equal to 0, flowfile passes through unchanged
    {    
        //next, find array size (arrays are all the same size,
        //so only need to check first one)    
        def ap = /{"text":(?:(?!},{|}],[{).)*}/
        def am = m[0] =~ ap
        def arraySize = am.count
    
        rcrd = rcrd + "["
        //then loop number of array size times:    
        for (int i = 0; i < arraySize; i++) 
        {        
            //loop through every item (both single and arrays)
            rcrd = rcrd + "["
            def ip = /[{"text":[^[]*[^[]*}]/
            def im = ffcontent =~ ip
            for (int j = 0; j < im.count; j++) 
            {
                //now check if single or array item
                def ai = im[j] =~ ap
                if(ai.count > 1) //array item
                {
                    //array, so pick the ith value                
                    rcrd = rcrd + "[" + ai[i] + "]";
                }
                else //single item
                {
                    rcrd = rcrd + im[j];
                }
                if(j < im.count - 1)
                {
                    rcrd = rcrd + ","
                }
            }
            rcrd = rcrd + "]"
            if(i < arraySize-1)
            {
                rcrd = rcrd + ","
            }
        }
        rcrd = rcrd + "]"
        ffcontent = rcrd
    }
            
        outputStream.write(ffcontent.getBytes(StandardCharsets.UTF_8));
    } as StreamCallback)
    
    session.transfer(flowFile, REL_SUCCESS)
    

  2. Have you tried JSLT instead of JOLT? May be easier for this task.

    https://github.com/schibsted/jslt/tree/master

    You can use it with JSLTTransformJSON.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search