skip to Main Content

I have a model with a jsonb field where responses from a third-party service are stored. The response changes periodically; some fields are added, some disappear. I want to generate a CSV from a sample of records. To do this, I need to gather all possible fields from the sample and turn them into columns in the CSV file.

2

Answers


  1. To achieve this, you’ll need to iterate over your sample records, extract all unique keys from the JSONB field, and then use those keys to create columns in your CSV file. Here’s a step-by-step approach you can follow in Python:

        import json
        import csv
        from collections import defaultdict
    
        # Sample records with JSONB field
        sample_records = [
            {"id": 1, "data": {"key1": "value1", "key2": "value2"}},
            {"id": 2, "data": {"key2": "value2", "key3": "value3"}},
            {"id": 3, "data": {"key1": "value1", "key4": "value4"}}
        ]
    
        # Extract all unique keys from the JSONB field
        all_keys = set()
        for record in sample_records:
            data = record.get("data", {})
            all_keys.update(data.keys())
    
        # Create CSV file and write header
        with open('output.csv', 'w', newline='') as csvfile:
            fieldnames = ['id'] + list(all_keys)  # Include 'id' column and unique keys
            writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
            writer.writeheader()
    
            # Write data rows
            for record in sample_records:
                row = {'id': record['id']}
                data = record.get("data", {})
                row.update(data)  # Add data fields
                writer.writerow(row)
    
        print("CSV file generated successfully.")
    

    This script iterates over each record in the sample, extracts the keys from the JSONB field, and updates a set containing all unique keys. Then, it creates a CSV file and writes the header with the ‘id’ column and all unique keys as column names. Finally, it iterates over the sample records again, writes each record to the CSV file with corresponding values for each column.

    You can replace sample_records with your actual sample data retrieved from the database. Additionally, ensure that you handle any potential errors or edge cases based on your specific requirements and the nature of your data.

    Login or Signup to reply.
  2. Pivoting rows to columns is painful because SQL needs to know the schema of SELECT before it starts. Since a CSV file is just data, I have done something along these lines before:

    with invals (id, jdata) as (
      values 
      (1, '{"a": 11, "b": 12, "c": 12}'::jsonb),
      (2, '{"d": 24, "e": 25, "f": 26}'::jsonb),
      (3, '{"a": 31, "c": 33, "e": 35}'::jsonb),
      (4, '{"a": 41, "b": 42, "c": 43, "d": 44, "f": 46}'::jsonb)
    ), allkeys as (
      select distinct k 
        from invals 
             cross join lateral jsonb_each(jdata) as e(k,v)
    )
    select -999 as id, string_agg(k, ',' order by k) as d
      from allkeys
    union all
    select id, string_agg(coalesce(i.jdata->>k, ''), ',' order by k)
      from invals i
           cross join allkeys a
     group by id
     order by id
    ;
    

    Working fiddle

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search