skip to Main Content

I have a custom Python module, call it mymodule, with a function that loads some data in a json file according to a key:

import json

def fetch_data(k):
    fname = os.path.join(
        os.path.abspath(os.path.dirname(__file__)),
        "data.json"
    )
    with open(fname, "r") as f:
        data = json.load(f)
    return data[k]

...

I then use it in another script, e.g.

from mymodule import fetch_data

for k in "ABCD":
    print(fetch_data(k))

As implemented, the problem is that the function opens the file each time, which if the json is large, this can slow things considerably. I thought about loading the json outside the function as a global variable and then call it in the function, but as the module does many other things that do not require that data, it also seems silly to open the json each time the module is called, again particularly if the json is big).

What is the best practice in Python to

  1. Load the external data only when needed
  2. Keep the data in memory after it has been loaded once if it is needed later (as in the example above)

I thought about a class that gets initialised only when first called with a getter function for the key, but I’m not exactly sure how to implement it properly.

2

Answers


  1. You are not reading the file into memory and reusing it, each loop you are reloading the read of the file.

    I would suggest something along the lines of changing your fetch_data() method to return the whole json dictionary.

    import json
    
    def fetch_data():
        fname = os.path.join(
            os.path.abspath(os.path.dirname(__file__)),
            "data.json"
        )
        with open(fname, "r") as f:
            data = json.load(f)
        return data
    
    

    Then you can call your loop like so:

    data = fetch_data()
    for k in "ABCD":
        print(data.get(k))
    

    This reads the file into a variable above the loop and you use the formatted dictionary to find keys through each iteration, multiple reads of the file is gone.

    Login or Signup to reply.
  2. This is less a question of "best practice" and more a question of what works best given your memory and performance constraints.

    Given your current access pattern – only iterating over the top-level keys of a very large dictionary, and not needing them all in memory at once – you may want to consider taking up a different JSON parsing library such as ijson. If your example access pattern is reflective of what you’re actually doing, then this may be more efficient – it will only occupy as much memory as you really need at any given time.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search