skip to Main Content

This is my input file – It is just a sample It will have more than 20k rows

{"A":12345,"B":"2023-12-31 00:00:00","C":5510,"D":20231231,"E":20240103,"F":20231231,"G":1577158,"H":63217,"I":null,"J":"Company123","K":"SupplierXYZ","L":"987","M":"Vendor ABC","N":"Invoice789","O":"2023-12-31T00:00:00","P":"2024-01-03T00:00:00","Q":"2023033344","R":"7559","S":"Dept123","T":"ModelXYZ","U":7179.0,"V":3,"W":"3769686.0","X":"TypeJ","Y":7179.0,"Z":null,"AA":"NOK","AB":"202312.0","AC":"EXTERNAL","AD":null,"AE":"Counterparty999","AF":"2023-12-31T00:00:00","AG":null,"AH":null,"AI":"66c6c4cc31107992a4079f3653a8a86930a2675d8663c5ab2ec4df23b0f6f3b0","AJ":"2024-02-13T12:58:02.597","AK":"2024-01-03T08:31:16.425292"}
{"A":54321,"B":"2024-01-15 08:00:00","C":7890,"D":20240115,"E":20240120,"F":20240115,"G":2468135,"H":951753,"I":null,"J":"Company456","K":"Supplier789","L":"123456789","M":"Vendor XYZ","N":"Invoice123","O":"2024-01-15T08:00:00","P":"2024-01-20T08:00:00","Q":"2024011515","R":"1234","S":"Dept456","T":"ModelABC","U":9876.0,"V":6,"W":"9876543.0","X":"TypeK","Y":9876.0,"Z":null,"AA":"USD","AB":"202401.0","AC":"INTERNAL","AD":null,"AE":"Counterparty888","AF":"2024-01-15T08:00:00","AG":null,"AH":null,"AI":"a1b2c3d4e5f6g7h8i9","AJ":"2024-02-15T08:00:00.123","AK":"2024-01-20T08:00:00.987654"}

I want the output as

{
"dataRepositoryId": "fixedvalue",
"importName": "uuid,
"data": [
{"A":12345,"B":"2023-12-31 00:00:00","C":5510,"D":20231231,"E":20240103,"F":20231231,"G":1577158,"H":63217,"I":null,"J":"Company123","K":"SupplierXYZ","L":"987","M":"Vendor ABC","N":"Invoice789","O":"2023-12-31T00:00:00","P":"2024-01-03T00:00:00","Q":"2023033344","R":"7559","S":"Dept123","T":"ModelXYZ","U":7179.0,"V":3,"W":"3769686.0","X":"TypeJ","Y":7179.0,"Z":null,"AA":"NOK","AB":"202312.0","AC":"EXTERNAL","AD":null,"AE":"Counterparty999","AF":"2023-12-31T00:00:00","AG":null,"AH":null,"AI":"66c6c4cc31107992a4079f3653a8a86930a2675d8663c5ab2ec4df23b0f6f3b0","AJ":"2024-02-13T12:58:02.597","AK":"2024-01-03T08:31:16.425292"},
{"A":54321,"B":"2024-01-15 08:00:00","C":7890,"D":20240115,"E":20240120,"F":20240115,"G":2468135,"H":951753,"I":null,"J":"Company456","K":"Supplier789","L":"123456789","M":"Vendor XYZ","N":"Invoice123","O":"2024-01-15T08:00:00","P":"2024-01-20T08:00:00","Q":"2024011515","R":"1234","S":"Dept456","T":"ModelABC","U":9876.0,"V":6,"W":"9876543.0","X":"TypeK","Y":9876.0,"Z":null,"AA":"USD","AB":"202401.0","AC":"INTERNAL","AD":null,"AE":"Counterparty888","AF":"2024-01-15T08:00:00","AG":null,"AH":null,"AI":"a1b2c3d4e5f6g7h8i9","AJ":"2024-02-15T08:00:00.123","AK":"2024-01-20T08:00:00.987654"}
]}

the input file should be encapsulated in the data column
I have read the input file through a csv dataset and gave delimeter as ^ so that it can read it as one column and then used a derived column and used data as [Inputcolumn] but my final data is looking like this

{"dataRepositoryId": "fixed vale","importName": "uuid","data":"["{\"A\":12345,\"B\":\"2023-12-31 00:00:00\",\"C\":5510,\"D\":20231231,\"E\":20240103,\"F\":20231231,\"G\":1577158,\"H\":63217,\"I\":null,\"J\":\"Company123\",\"K\":\"SupplierXYZ\",\"L\":\"987\",\"M\":\"Vendor ABC\",\"N\":\"Invoice789\",\"O\":\"2023-12-31T00:00:00\",\"P\":\"2024-01-03T00:00:00\",\"Q\":\"2023033344\",\"R\":\"7559\",\"S\":\"Dept123\",\"T\":\"ModelXYZ\",\"U\":7179.0,\"V\":3,\"W\":\"3769686.0\",\"X\":\"TypeJ\",\"Y\":7179.0,\"Z\":null,\"AA\":\"NOK\",\"AB\":\"202312.0\",\"AC\":\"EXTERNAL\",\"AD\":null,\"AE\":\"Counterparty999\",\"AF\":\"2023-12-31T00:00:00\",\"AG\":null,\"AH\":null,\"AI\":\"66c6c4cc31107992a4079f3653a8a86930a2675d8663c5ab2ec4df23b0f6f3b0\",\"AJ\":\"2024-02-13T12:58:02.597\",\"AK\":\"2024-01-03T08:31:16.425292\"}\n", "{\"A\":54321,\"B\":\"2024-01-15 08:00:00\",\"C\":7890,\"D\":20240115,\"E\":20240120,\"F\":20240115,\"G\":2468135,\"H\":951753,\"I\":null,\"J\":\"Company456\",\"K\":\"Supplier789\",\"L\":\"123456789\",\"M\":\"Vendor XYZ\",\"N\":\"Invoice123\",\"O\":\"2024-01-15T08:00:00\",\"P\":\"2024-01-20T08:00:00\",\"Q\":\"2024011515\",\"R\":\"1234\",\"S\":\"Dept456\",\"T\":\"ModelABC\",\"U\":9876.0,\"V\":6,\"W\":\"9876543.0\",\"X\":\"TypeK\",\"Y\":9876.0,\"Z\":null,\"AA\":\"USD\",\"AB\":\"202401.0\",\"AC\":\"INTERNAL\",\"AD\":null,\"AE\":\"Counterparty888\",\"AF\":\"2024-01-15T08:00:00\",\"AG\":null,\"AH\":null,\"AI\":\"a1b2c3d4e5f6g7h8i9\",\"AJ\":\"2024-02-15T08:00:00.123\",\"AK\":\"2024-01-20T08:00:00.987654\"}\n"]"}

2

Answers


  1. Try using derived column as json(Inputcolumn) will help.

    Json() function will convert the string into json output.

    Also you can use replace(‘\’,”) it will works and generate output like this

    
    ["{"A":12345,"B":"2023-12-31 00:00:00","C":5510,"D":20231231,"E":20240103,"F":20231231,"G":1577158,"H":63217,"I":null,"J":"Company123","K":"SupplierXYZ","L":"987","M":"Vendor ABC","N":"Invoice789","O":"2023-12-31T00:00:00","P":"2024-01-03T00:00:00","Q":"2023033344","R":"7559","S":"Dept123","T":"ModelXYZ","U":7179.0,"V":3,"W":"3769686.0","X":"TypeJ","Y":7179.0,"Z":null,"AA":"NOK","AB":"202312.0","AC":"EXTERNAL","AD":null,"AE":"Counterparty999","AF":"2023-12-31T00:00:00","AG":null,"AH":null,"AI":"66c6c4cc31107992a4079f3653a8a86930a2675d8663c5ab2ec4df23b0f6f3b0","AJ":"2024-02-13T12:58:02.597","AK":"2024-01-03T08:31:16.425292"}n", "{"A":54321,"B":"2024-01-15 08:00:00","C":7890,"D":20240115,"E":20240120,"F":20240115,"G":2468135,"H":951753,"I":null,"J":"Company456","K":"Supplier789","L":"123456789","M":"Vendor XYZ","N":"Invoice123","O":"2024-01-15T08:00:00","P":"2024-01-20T08:00:00","Q":"2024011515","R":"1234","S":"Dept456","T":"ModelABC","U":9876.0,"V":6,"W":"9876543.0","X":"TypeK","Y":9876.0,"Z":null,"AA":"USD","AB":"202401.0","AC":"INTERNAL","AD":null,"AE":"Counterparty888","AF":"2024-01-15T08:00:00","AG":null,"AH":null,"AI":"a1b2c3d4e5f6g7h8i9","AJ":"2024-02-15T08:00:00.123","AK":"2024-01-20T08:00:00.987654"}"]
    
    
    Login or Signup to reply.
  2. how to convert the Json string to the array and remove the escape characters

    As you are taking Json rows in CSV file when you are pushing it to Json file the dataflow considering whole thing as string, and you are getting backslashes as escape characters.

    • My data sample:
      enter image description here

    • As you know the schema you need to parse it as Json format using below expression

    (A as string,       B as date,      C as integer,       D as integer,       E as integer,       F as integer,       G as integer,       H as integer,       I as integer,       J as string,        K as string,        L as string,        M as string,        N as string,        O as string,        P as string,        Q as string,        R as string,        S as string,        T as string,        U as double,        V as integer,       W as string,        X as string,        Y as double,        Z as string,        AA as string,       AB as string,       AC as string,       AD as string,       AE as string,       AF as string,       AG as string,       AH as string,       AI as string,       AJ as string,       AK as string)
    

    enter image description here

    • Then Take derived column and create an dummy column with value 1.
      enter image description here
    • Then take aggregate function to create data array:
      enter image description here
      enter image description here
    • Then take another derived column transformation to create another columns.
      enter image description here
    • Then take select transformation and select only required column.
      enter image description here
    • Then add sink and Json file where you need output.
      enter image description here

    OUTPUT:

    enter image description here

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search