I have a not so massive table. About 500k records.
It comes as a JSON file.
I have to load it, parse it and flatten the dict (each row is a dict)
This takes quite some time, but manageable (maybe around 4-5 minutes)
However, at some point I need to take all these rows and uj
them
(uj/) enlist each row // this takes quite a bit
I only have 2 cores to use in dev and maybe 4 in prod/dr
Using 2 cores I saved some time, but not something that motivated me to update our infrastructure.
Is there anything I am doing wrong ?
Anything I am missing ?
I know you could use some data, but creating a synthetic table won’t be much help.
Should I maybe consider starting 10 new secondary procs and split the ~500k in 50k sets and pass them to these processes to be processed ?
This table has 492000 records and 103 columns.
2
Answers
A few questions: How exactly are you parsing the Json file? Are you using the inbuild jason parser https://code.kx.com/q/ref/dotj/ ? Are the dictionaries conform? i.e do they have the same keys? If so, you don’t have to enlist them, a list of conform dictionaries is basically a table. You can read more about dictionaries and tables here: https://www.defconq.tech/docs/concepts/dictionariesTables