skip to Main Content

I am working on a code that extracts data from a JSON file here is the JSON file: Google CDN

and here is a sample of JSON code:

{
  "syncToken": "1677578581095",
  "creationTime": "2023-02-28T02:03:01.095938",
  "prefixes": [{
    "ipv4Prefix": "34.80.0.0/15",
    "service": "Google Cloud",
    "scope": "asia-east1"
  }, {
    "ipv4Prefix": "34.137.0.0/16",
    "service": "Google Cloud",
    "scope": "asia-east1"
  }, {
    "ipv4Prefix": "35.185.128.0/19",
    "service": "Google Cloud",
    "scope": "asia-east1"
  }, {
    "ipv6Prefix": "2600:1900:40a0::/44",
    "service": "Google Cloud",
    "scope": "asia-south1"
  },

I know where the problem is but can not fix the issue with solutions on this website and getting another error every time.

This is my code

import json
f = open('cloud.json')
data = json.load(f)
array = []

for i in data['prefixes']:
    array = [i['prefix'] for i in data['ipv4Prefix']]
f_path = (r"ip.txt")
with open (f_path ,'w') as d:
       for lang in array:
        d.write("{}n".format(lang))
f.close()

Basically I want to extract only ipv4 address but there are some ipv6 address randomly in block that causes this error so I get key error like this: KeyError: ‘ipv4Prefix’

I know why I am getting this error so I tried deleting that whole entry with ipv6Prefix so I added this part to my code:

    if data[i]["prefixes"] == "ipv6Prefix":
        data.pop(i)

for this one I get TypeError: unhashable type: ‘dict’ which is new to me, I also tried this as someone pointed out in another question but it didn’t work.

del data[ipv6Prefix]

Now my final code is like this and getting this error: TypeError: list indices must be integers or slices, not str which is understandable.

import json
f = open('cloud.json')
data = json.load(f)
array = []
for i in data['prefixes']:
    if [i]["prefixes"] == ['ipv6Prefix']:
        data.pop(i)
    array = [i['prefix'] for i in data['ipv4Prefix']]
f_path = (r"ip.txt")
with open (f_path ,'w') as d:
       for lang in array:
        d.write("{}n".format(lang))
f.close()

So how can I delete entries with ‘ipv6Prefix’ or better to say, ignore them in my for loop?

I found this question but answer does not fit my code at all.

what’s the problem with my code?

I tried several methods like del and dict.pop() but still I get error.

2

Answers


  1. You have two choices: Look Before You Leap or Easier to Ask Forgiveness than Permission. In short:

    • LBYL: Do an if check to make sure ipv4Prefix exists
    • EAFP: Assume that ipv4Prefix exists but catch the exception (a KeyError in this case)

    Here is some code that demonstrates both approaches. It does not include writing out the results.

    import json
    
    
    def lbyl(data: dict):
        """Look before you leap"""
        ipv4s = []
    
        for prefix in data["prefixes"]:
            # Ensure that "ipv4Prefix" exists
            if "ipv4Prefix" in prefix:
                ipv4s.append(prefix["ipv4Prefix"])
        return ipv4s
    
    
    def eafp(data: dict):
        """Easier to Ask Forgiveness than Permission"""
        ipv4s = []
    
        for prefix in data["prefixes"]:
            try:
                ipv4s.append(prefix["ipv4Prefix"])
            except KeyError:
                # This happens when "ipv4Prefix" is not in prefix
                pass
    
        return ipv4s
    
    
    def get_data(path) -> dict:
        with open(path) as f:
            return json.load(f)
    
    
    if __name__ == "__main__":
        data = get_data("cloud.json")
        print(lbyl(data))
        print(eafp(data))
    

    Which style to use is subjective. Python has a reputation for preferring EAFP, but I prefer to use LYBL if errors are expected as part of normal operation. In your case you know that some objects will not have ipv4Prefix, so I contend that LBYL is more suitable here.

    Login or Signup to reply.
  2. So how can I delete entries with ‘ipv6Prefix’ or better to say, ignore them in my for loop?

    You can skip/ignore prefixes containing ipv6Prefix with if...continue:

    # import json
    # with open('cloud.json') as f: data = json.load(f) ## safer than f=open...
    
    with open ("ip.txt" ,'w') as d:
        for prefix_i in data['prefixes']:
            # if 'ipv6Prefix' not in prefix_i: d.write("{prefix_i}n") ## OR
            if 'ipv6Prefix' in prefix_i: continue
            d.write("{}n".format(prefix_i))
        ## list-comprehension INSTEAD OF for-loop:
        # d.write('n'.join(str(p) for p in data['prefixes'] if 'ipv6Prefix' not in p)) 
    

    You can write only prefixes containing ipv4Prefix with if 'ipv4Prefix' in...

    with open ("ip.txt" ,'w') as d:
        for prefix_i in data['prefixes']:
            if 'ipv4Prefix' in prefix_i: d.write("{}n".format(prefix_i))
    

    You can alter data itself to omit prefixes containing ipv6Prefix with list comprehension:

    data['prefixes'] = [p for p in data['prefixes'] if 'ipv6Prefix' not in p]
    

    You can save a list of prefixes containing ipv4Prefix as JSON with json.dump:

    ## to just save the list as a variable:
    # ipv4Prefixes = [p for p in data['prefixes'] if 'ipv4Prefix' in p]
    
    with open('ipv4Prefixes.json', w) as f:
        json.dump([p for p in data['prefixes'] if 'ipv4Prefix' in p], f)
    


    getting this error: TypeError: list indices must be integers or slices, not str

    That’s probably due to the if [i]["prefixes"] == ['ipv6Prefix']: line; [i] is a list with just a single item [i, which is a dictionary], so [i]["prefixes"] just doesn’t make any sense. You can use if 'ipv6Prefix' in i["prefixes"] instead, but there are more issues with what you’re trying to accomplish in that block [I’ll explain in the next section].


    # for i in data['prefixes']...
            data.pop(i)
    

    The .pop method only takes an integer as input [which has to be the index of the item you want to remove from that list], but i is a copy of a dictionary inside data['prefixes'], so .pop(i) would raise an error if there’s an attempt to execute it.

    You could loop through enumerate(data['prefixes'])(instead of just data['prefixes']) to keep track of the index associated i, but keep in mind that looping through a list to pop multiple items [from that same list] is NOT advisable at all. For example, if you pop the second item from the list [index=1], then the indices of all items after it will decrease by one; so if you next need to pop what was originally the 5th item in the list, enumerate will tell you that its index is 4, but it actually became 3 after executing .pop(1)

    You could loop through the list in reverse as below (but isn’t the list-comprehension approach I suggested before simpler?)

    for pi, p in enumerate(reversed(data['prefixes']), 1-len(data['prefixes'])):
        if 'ipv6Prefix' in p["prefixes"]: data['prefixes'].pop(pi)
    

    Btw, instead of applying reversed, you can also use slicing like data['prefixes'][::-1]. I just thought using the function is better for readability because it makes it very obvious what it’s doing.


        if data[i]["prefixes"] == "ipv6Prefix":
    

    for this one I get TypeError: unhashable type: 'dict' which is new to me

    i is a dictionary (which is unhashable, as the error message said), and therefore cannot be used as a key the way ....data[i]... is trying to.


    so I get key error like this: KeyError: 'ipv4Prefix'

    probably from the data['ipv4Prefix'] bit in the array = [i['prefix'] for i in data['ipv4Prefix']], because data does not have a key ipv4Prefix; some is in for i in data['prefixes'] might, but there is no point in using if 'ipv4Prefix' in i: del i because i is a copy of an item in the list being looped though.

    You can try using .remove like data['prefixes'].remove(i) [instead of del i], but I don’t think that would be very efficient. List comprehension is definitely my preferred method in this case [and also probably considered the most "pythonic" approach here].

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search