I have a custom Python module, call it mymodule
, with a function that loads some data in a json file according to a key:
import json
def fetch_data(k):
fname = os.path.join(
os.path.abspath(os.path.dirname(__file__)),
"data.json"
)
with open(fname, "r") as f:
data = json.load(f)
return data[k]
...
I then use it in another script, e.g.
from mymodule import fetch_data
for k in "ABCD":
print(fetch_data(k))
As implemented, the problem is that the function opens the file each time, which if the json is large, this can slow things considerably. I thought about loading the json outside the function as a global variable and then call it in the function, but as the module does many other things that do not require that data, it also seems silly to open the json each time the module is called, again particularly if the json is big).
What is the best practice in Python to
- Load the external data only when needed
- Keep the data in memory after it has been loaded once if it is needed later (as in the example above)
I thought about a class that gets initialised only when first called with a getter function for the key, but I’m not exactly sure how to implement it properly.
2
Answers
You are not reading the file into memory and reusing it, each loop you are reloading the read of the file.
I would suggest something along the lines of changing your
fetch_data()
method to return the whole json dictionary.Then you can call your loop like so:
This reads the file into a variable above the loop and you use the formatted dictionary to find keys through each iteration, multiple reads of the file is gone.
This is less a question of "best practice" and more a question of what works best given your memory and performance constraints.
Given your current access pattern – only iterating over the top-level keys of a very large dictionary, and not needing them all in memory at once – you may want to consider taking up a different JSON parsing library such as
ijson
. If your example access pattern is reflective of what you’re actually doing, then this may be more efficient – it will only occupy as much memory as you really need at any given time.