skip to Main Content

I’m trying to delete entire objects in a JSON file on the condition that they do not include ALL keys: "transaction_date", "asset_description", "asset_type", "type" and "amount" keys.

Below is my JSON file (it’s been cut for this example):

{
    "first_name": {
        "0": "Thomas",
        "1": "John",
    },
    "transactions": {
       "0": [
            {
                "transaction_date": "11/29/2022",
                "asset_description": "FireEye, Inc.",
                "asset_type": "Stock",
                "type": "Sale (Partial)",
                "amount": "$1,001 - $15,000"
            }
          ],
       "1": [
            {
                "scanned_pdf": true,
                "ptr_link": "https://efdsearch.senate.gov/search/view/paper/658E53E8-7C2C,
                "date_recieved": "01/30/2013"
            }
          ],
          
     }
}

I need to delete the entire "1" data from transactions and first_name. There are more then these two in the original file so the code needs to be universal to any amount rather than using [0], [1] etc. My code below tries to find items in "transactions" that do not include "scanned_pdf", "ptr_link" and "date_recieved" and then saves the JSON just with that updated data (my method is kind of inversed, so instead of deleting objects if it doesn’t include x, it will pick up the objects that don’t include y and update the JSON):

import json

with open("xxxtester.json", "r") as f_in:
    data = json.load(f_in)

to_delete = {"scanned_pdf", "ptr_link", "date_recieved"}

for k in data["transactions"]:
    data["transactions"][k] = [
        {kk: vv for kk, vv in d.items() if kk not in to_delete}
        for d in data["transactions"][k]]


open("xxxtester.json", "w").write(
    json.dumps(data, indent=4))

However, my output still shows the "1" but with empty data "{}" etc. Should I use a different method of logic towards this? Or is it possible to add code to the existing script to make it work.

below is my desired output:

{
    "first_name": {
        "0": "Thomas",
    },
    "transactions": {
       "0": [
            {
                "transaction_date": "11/29/2022",
                "asset_description": "FireEye, Inc.",
                "asset_type": "Stock",
                "type": "Sale (Partial)",
                "amount": "$1,001 - $15,000"
            }
          ],
      }
}

2

Answers


  1. With this code you are going to delete the whole thing.

    import json
    
    with open("xxxtester.json", "r") as f_in:
        data = json.load(f_in)
    
    
    with open("xxxtester.json", "w") as f:
        del data["transactions"]["1"]
        json.dump(data, f)
    
    Login or Signup to reply.
  2. If we reverse your logic (so we’re selecting items we want to keep, rather than the other way around) and add a second comprehension to filter out empty values, we end up with this:

    import json
    
    with open("xxxtester.json", "r") as f_in:
        data = json.load(f_in)
    
    required = set(
        ("transaction_date", "asset_description", "asset_type", "type", "amount")
    )
    
    data["transactions"] = {
        k: [transaction for transaction in v if all(k in transaction for k in required)]
        for k, v in data['transactions'].items()
    }
    
    data["transactions"] = {
        k: v for k, v in data['transactions'].items() if v
    }
    
    # Update data["first_name"] so that it only contains keys that also exists
    # in data["transactions"].
    data["first_name"] = {k: v for k, v in data["first_name"].items() if k in data["transactions"]}
    
    print(json.dumps(data, indent=4))
    

    Given input like this:

    {
        "first_name": {
            "0": "Thomas",
            "1": "John"
        },
        "transactions": {
           "0": [
                {
                    "transaction_date": "11/29/2022",
                    "asset_description": "FireEye, Inc.",
                    "asset_type": "Stock",
                    "type": "Sale (Partial)",
                    "amount": "$1,001 - $15,000"
                },
                {
                    "scanned_pdf": true,
                    "ptr_link": "https://efdsearch.senate.gov/search/view/paper/658E53E8-7C2C",
                    "date_recieved": "01/30/2013"
                }
              ],
           "1": [
                {
                    "scanned_pdf": true,
                    "ptr_link": "https://efdsearch.senate.gov/search/view/paper/658E53E8-7C2C",
                    "date_recieved": "01/30/2013"
                }
              ]
         }
    }
    

    The above code produces:

    {
        "first_name": {
            "0": "Thomas"
        },
        "transactions": {
            "0": [
                {
                    "transaction_date": "11/29/2022",
                    "asset_description": "FireEye, Inc.",
                    "asset_type": "Stock",
                    "type": "Sale (Partial)",
                    "amount": "$1,001 - $15,000"
                }
            ]
        }
    }
    

    The first dictionary comprehension…

    data["transactions"] = {
        k: [transaction for transaction in v if all(k in transaction for k in required)]
        for k, v in data['transactions'].items()
    }
    

    …produces:

    ...
        "transactions": {
            "0": [
                {
                    "transaction_date": "11/29/2022",
                    "asset_description": "FireEye, Inc.",
                    "asset_type": "Stock",
                    "type": "Sale (Partial)",
                    "amount": "$1,001 - $15,000"
                }
            ],
            "1": []
        }
    ...
    

    The second comprehension filters out keys that have empty lists as values.

    The third comprehension removes items from data["first_name] that don’t exist in data["transactions"].

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search