I am designing the database for a project that I am working on, basically the project is a question bank.
Each question has a flag(mark), where the user can set it as true so that they mark it to get back to it later.
The number of questions is (10,000) for now, and it will increase of course. So we can say that each user can mark (10,000) questions for now.
If I create a huge table where I save the questions id and user id and the flag, the table will be huge because if we have (100,000) users, and each user can flag (10,000) questions, the the table size will be (1,000,000,000) rows, and the number will increase if we got more users or more questions.
also we have a mark filter, when the user wants to create a quiz they can pick avoid (marked questions), or they can search for only marked questions, in other words we will use the mark filter a lot.
Plus, I am sure that I don’t need to have the marks for all users in one table, as it is only important for the user himself, and I don’t care at all to see them as admin or even for Statical analysis across all users, so having all of them in one table is not going helpful.
Now after knowing that, is it wise to create a table for each user to save his own marks? Or is there a way to save the marks where it would not be slow to search for it?
I read that making a table for each user is not a good approach, but also I read that larg tables is a huge problem. And I need suggestions here
2
Answers
It is definitely a bad idea to create a table for each user. You can classify the questions and create a table by type. For example, there is a definition that puts type A questions into table A, type B questions into table A, and type C. Use table B for questions
The answer to your question is it depends on your case
Here’s how I would approach solution with certain questions and assumptions
I would create another table for now for flags and add total_flag_count to the questions table. This is for optimization when we want to see total flags count (purely hypothetical you might not need this). Every time a user flags question we are going to increment total_flag_count and if a user removes flag we will decrement it.
Your 1 billion row is a worst case estimate and even in this case you are probably fine with this as you can always partition it. Large tables by themselves are not a problem. Depending upon use cases and queries they might be fine.
There are other solutions. You can use array data structure in columns and so on. You might be able to avoid using many rows using this approach but I would say that it’s overkill at this point.
First lunch your app analyze how users are using it and then find a solution for your use case. You can always use caching to make it faster