I am currently working on a Django based service that will gather users’ data from their Shopify shops and provide some insights. I use PostgreSQL as my DB server. The problem I am facing is whether it is more efficient to create a separate table for every particular user and store their data there or rather keep everything in one big table. Approximate data length – about 100k-1mln rows per user. Data is unrelated among users. Furthermore, I will need to store the user-specific results of my analysis. Any thoughts?
2
Answers
I suggest you have one big table (
Table1
) for the data of all users (one column would indicate which user it belongs to).For example, to store all the pages a user visited on one of your sites, you could use something like this:
If
Table1
gets too big (like you indicated that it may happen), you could create periodical backups with date ranges (maybe one backup file with data from a single month ofTable1
rows) and later delete the older data fromTable1
.Furthermore, you could have a second table (
Table2
) that stores results of analysing the data fromTable1
.Table2
would also have a column indicating which user it belongs to, and maybe even another column which indicates what date range it belongs to.Continuing with the example, you could create a ranking to see the most visited page for each user in a single month:
After calculating the results for
Table2
you could remove the rows fromTable1
to save some space.This may be a silly/simple example, but I think it could be a valid approach for large amount of data. I hope it gives you some pointers to think about.
@Gleb: you commented on my other answer:
So here goes a more concrete example.
For starters, you have your
User
model (maybe even djangos default table) where you store attributes like names and email.Second, you have transactions (I made up a few basic fields):
And third, you could have a monthly summary of transactions to sum up the received and sent amounts of each user:
Now, the
Transaction
model will have a lot of records (you say it could have millions per user), but theSummary
model will only have one record per month per user, which is quite manageable.Still, I don’t recommend using a separate model/table of transactions for each user.
Once a month you could run a process that calculates the summaries for each user (or any other data analysis you wish to run) if you don’t wish to update the
Summary
model instance each time a new transaction is made.Afterwards you are free to archive the old
Transactions
(maybe everything older than 3 months or a year).Once a month you could run a process to export a given set of
Transaction
instances to a file (CSV, JSON or a SQL dump) and remove them from theTransaction
model to keep the size of the table from expanding to much. You could obtain backup files per month for easier lookup if you ever need theTransaction
data again:I hope this gives you a few ideas to think about.