This is my input file – It is just a sample It will have more than 20k rows
{"A":12345,"B":"2023-12-31 00:00:00","C":5510,"D":20231231,"E":20240103,"F":20231231,"G":1577158,"H":63217,"I":null,"J":"Company123","K":"SupplierXYZ","L":"987","M":"Vendor ABC","N":"Invoice789","O":"2023-12-31T00:00:00","P":"2024-01-03T00:00:00","Q":"2023033344","R":"7559","S":"Dept123","T":"ModelXYZ","U":7179.0,"V":3,"W":"3769686.0","X":"TypeJ","Y":7179.0,"Z":null,"AA":"NOK","AB":"202312.0","AC":"EXTERNAL","AD":null,"AE":"Counterparty999","AF":"2023-12-31T00:00:00","AG":null,"AH":null,"AI":"66c6c4cc31107992a4079f3653a8a86930a2675d8663c5ab2ec4df23b0f6f3b0","AJ":"2024-02-13T12:58:02.597","AK":"2024-01-03T08:31:16.425292"}
{"A":54321,"B":"2024-01-15 08:00:00","C":7890,"D":20240115,"E":20240120,"F":20240115,"G":2468135,"H":951753,"I":null,"J":"Company456","K":"Supplier789","L":"123456789","M":"Vendor XYZ","N":"Invoice123","O":"2024-01-15T08:00:00","P":"2024-01-20T08:00:00","Q":"2024011515","R":"1234","S":"Dept456","T":"ModelABC","U":9876.0,"V":6,"W":"9876543.0","X":"TypeK","Y":9876.0,"Z":null,"AA":"USD","AB":"202401.0","AC":"INTERNAL","AD":null,"AE":"Counterparty888","AF":"2024-01-15T08:00:00","AG":null,"AH":null,"AI":"a1b2c3d4e5f6g7h8i9","AJ":"2024-02-15T08:00:00.123","AK":"2024-01-20T08:00:00.987654"}
I want the output as
{
"dataRepositoryId": "fixedvalue",
"importName": "uuid,
"data": [
{"A":12345,"B":"2023-12-31 00:00:00","C":5510,"D":20231231,"E":20240103,"F":20231231,"G":1577158,"H":63217,"I":null,"J":"Company123","K":"SupplierXYZ","L":"987","M":"Vendor ABC","N":"Invoice789","O":"2023-12-31T00:00:00","P":"2024-01-03T00:00:00","Q":"2023033344","R":"7559","S":"Dept123","T":"ModelXYZ","U":7179.0,"V":3,"W":"3769686.0","X":"TypeJ","Y":7179.0,"Z":null,"AA":"NOK","AB":"202312.0","AC":"EXTERNAL","AD":null,"AE":"Counterparty999","AF":"2023-12-31T00:00:00","AG":null,"AH":null,"AI":"66c6c4cc31107992a4079f3653a8a86930a2675d8663c5ab2ec4df23b0f6f3b0","AJ":"2024-02-13T12:58:02.597","AK":"2024-01-03T08:31:16.425292"},
{"A":54321,"B":"2024-01-15 08:00:00","C":7890,"D":20240115,"E":20240120,"F":20240115,"G":2468135,"H":951753,"I":null,"J":"Company456","K":"Supplier789","L":"123456789","M":"Vendor XYZ","N":"Invoice123","O":"2024-01-15T08:00:00","P":"2024-01-20T08:00:00","Q":"2024011515","R":"1234","S":"Dept456","T":"ModelABC","U":9876.0,"V":6,"W":"9876543.0","X":"TypeK","Y":9876.0,"Z":null,"AA":"USD","AB":"202401.0","AC":"INTERNAL","AD":null,"AE":"Counterparty888","AF":"2024-01-15T08:00:00","AG":null,"AH":null,"AI":"a1b2c3d4e5f6g7h8i9","AJ":"2024-02-15T08:00:00.123","AK":"2024-01-20T08:00:00.987654"}
]}
the input file should be encapsulated in the data column
I have read the input file through a csv dataset and gave delimeter as ^ so that it can read it as one column and then used a derived column and used data as [Inputcolumn] but my final data is looking like this
{"dataRepositoryId": "fixed vale","importName": "uuid","data":"["{\"A\":12345,\"B\":\"2023-12-31 00:00:00\",\"C\":5510,\"D\":20231231,\"E\":20240103,\"F\":20231231,\"G\":1577158,\"H\":63217,\"I\":null,\"J\":\"Company123\",\"K\":\"SupplierXYZ\",\"L\":\"987\",\"M\":\"Vendor ABC\",\"N\":\"Invoice789\",\"O\":\"2023-12-31T00:00:00\",\"P\":\"2024-01-03T00:00:00\",\"Q\":\"2023033344\",\"R\":\"7559\",\"S\":\"Dept123\",\"T\":\"ModelXYZ\",\"U\":7179.0,\"V\":3,\"W\":\"3769686.0\",\"X\":\"TypeJ\",\"Y\":7179.0,\"Z\":null,\"AA\":\"NOK\",\"AB\":\"202312.0\",\"AC\":\"EXTERNAL\",\"AD\":null,\"AE\":\"Counterparty999\",\"AF\":\"2023-12-31T00:00:00\",\"AG\":null,\"AH\":null,\"AI\":\"66c6c4cc31107992a4079f3653a8a86930a2675d8663c5ab2ec4df23b0f6f3b0\",\"AJ\":\"2024-02-13T12:58:02.597\",\"AK\":\"2024-01-03T08:31:16.425292\"}\n", "{\"A\":54321,\"B\":\"2024-01-15 08:00:00\",\"C\":7890,\"D\":20240115,\"E\":20240120,\"F\":20240115,\"G\":2468135,\"H\":951753,\"I\":null,\"J\":\"Company456\",\"K\":\"Supplier789\",\"L\":\"123456789\",\"M\":\"Vendor XYZ\",\"N\":\"Invoice123\",\"O\":\"2024-01-15T08:00:00\",\"P\":\"2024-01-20T08:00:00\",\"Q\":\"2024011515\",\"R\":\"1234\",\"S\":\"Dept456\",\"T\":\"ModelABC\",\"U\":9876.0,\"V\":6,\"W\":\"9876543.0\",\"X\":\"TypeK\",\"Y\":9876.0,\"Z\":null,\"AA\":\"USD\",\"AB\":\"202401.0\",\"AC\":\"INTERNAL\",\"AD\":null,\"AE\":\"Counterparty888\",\"AF\":\"2024-01-15T08:00:00\",\"AG\":null,\"AH\":null,\"AI\":\"a1b2c3d4e5f6g7h8i9\",\"AJ\":\"2024-02-15T08:00:00.123\",\"AK\":\"2024-01-20T08:00:00.987654\"}\n"]"}
2
Answers
Try using derived column as json(Inputcolumn) will help.
Json() function will convert the string into json output.
Also you can use replace(‘\’,”) it will works and generate output like this
As you are taking Json rows in CSV file when you are pushing it to Json file the dataflow considering whole thing as string, and you are getting backslashes as escape characters.
My data sample:
As you know the schema you need to parse it as Json format using below expression
OUTPUT: