skip to Main Content

I’m trying to extract the list of all unique domains from the Disconnect.json file under all categories. Here’s the link to the json file I’m using. https://github.com/disconnectme/disconnect-tracking-protection/blob/master/services.json

And here’s my python code for it:

import json

def get_disconnect_domains(disconnect_json_file):
    disconnect_domains = set()
    with open(disconnect_json_file, 'r') as f:
        disconnect_json = json.load(f)
        for category in disconnect_json['categories']:
            for tracker in category['trackers']:
                disconnect_domains.add(tracker['domain'])

    return disconnect_domains

disconnect_domains = get_disconnect_domains('disconnect.json')

(I have the file saved as disconnect.json) But I keep running into this error:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-24-1e2723fa469d> in <cell line: 14>()
     12 
     13 # Get the list of all unique domains from the disconnect.json file
---> 14 disconnect_domains = get_disconnect_domains('services.json')
     15 
     16 # Print the list of all unique domains

<ipython-input-24-1e2723fa469d> in get_disconnect_domains(disconnect_json_file)
      6         disconnect_json = json.load(f)
      7         for category in disconnect_json['categories']:
----> 8             for tracker in category['trackers']:
      9                 disconnect_domains.add(tracker['domain'])
     10 

TypeError: string indices must be integers

I don’t usually work in python so I’m not sure what exactly is going wrong. Can someone help please?

2

Answers


  1. The error message you are encountering, "TypeError: string indices must be integers," suggests that you are trying to access a string using a string index as if it were a dictionary, which is not allowed in Python.

    The issue is most likely related to the structure of the JSON file you are trying to parse. To access elements in a JSON file, you should use dictionary-style indexing (with keys) when dealing with objects, and list-style indexing (with integers) when dealing with arrays.

    In your case, the error occurs because you are treating a string as if it were a dictionary. To fix this issue, you need to navigate through the JSON structure correctly based on the object types within the JSON.

    Here’s an updated version of your code to correctly navigate the JSON structure:

    import json
    
    def get_disconnect_domains(disconnect_json_file):
        disconnect_domains = set()
        with open(disconnect_json_file, 'r') as f:
            disconnect_json = json.load(f)
            for category in disconnect_json:
                if 'trackers' in category:
                    for tracker in category['trackers']:
                        if 'domain' in tracker:
                            disconnect_domains.add(tracker['domain'])
    
        return disconnect_domains
    
    disconnect_domains = get_disconnect_domains('services.json')
    
    # Print the list of all unique domains
    for domain in disconnect_domains:
        print(domain)
    

    This code assumes that the JSON file contains an array of objects with a ‘trackers’ key, and each tracker object has a ‘domain’ key. It navigates the JSON structure accordingly and extracts the unique domains.

    Make sure that the JSON structure of the ‘services.json’ file matches the assumptions made in this code. If the structure of the JSON is different, you’ll need to adjust the code accordingly.

    Login or Signup to reply.
  2. The domains are fairly deeply nested in your JSON structure. You can access them like this:

    import json
    
    with open('services.json', encoding='utf-8') as raw:
        output = set()
        data = json.load(raw)
        for v in data['categories'].values():
            for d in v:
                for _v in d.values():
                    for __v in _v.values():
                        if isinstance(__v, list):
                            output.update(__v)
        print(output)
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search