I have the following situation:
A user can have maximum number of partnerships. For example – 40.000
Question:
In case user wants to add a new partnership, how it will be faster to check the current number of partnerships ?
Solution 1:
Using a count(*) statement ?
Solution 2:
Storing the value into a separate column of user. And always when a new partnership needs to be added, to get it and then to increment that column ?
Personal remarks:
Are there any better solution to check the total number of rows ?
Does anyone have a statistic of how performance is affected during time ? I suppose that solution 1 is faster when there are a limited number of rows. But in case there are multiple rows, then it makes more sense to use solution 2. For example, after what period of time (amount of rows) solution 2 becomes better than 1 ?
I would prefer of course solution 1, because I get more control. Bugs might happen and the column from solution 2 to not be incremented. And in such cases, the number will not be correct.
2
Answers
Solution 2 is an example of denormalization, storing an aggregated value instead of relying on the base data. Querying this denormalized value is practically guaranteed to be faster than counting the base data, even for small numbers of rows.
But it comes at a cost for maintaining the stored value. You have to account for errors, which were discussed in the comments above. How will you know when there’s an error? Answer: you have to run the count query and compare that to the value stored in the denormalized column.
How frequently do you need to verify the counts? Perhaps after every update? In that case, it’s just as costly to verify the stored count as to calculate the real count from base data. In fact more costly, because you have to count and also update the user row.
Then it becomes a balance between how frequently you need to recalculate the counts versus how frequently you only query the stored count value. Every time you query between updates, you benefit from some cost savings, and if queries are a lot more frequent than updates, then you get a lot of savings. But if you update as frequently as you query, then you get no savings.
I’ll vote for Solution 2 (keep an exact count elsewhere).
This will be much faster than
COUNT(*)
, but there are things that can go wrong. Adding/deleting a partnership implies incrementing/decrementing the counter. And is there some case that is not exactly an INSERT/DELETE?The count should be done in a transaction. For "adding":
The overhead that is involved might be as much as 10ms. Counting to 40K would be much slower.