I have a case where users may have access to some content with subscriptions, it’s not video streaming but I will write an example with that for easier understanding:
- users in "basic" subscription have access to some basic videos (ID 0)
- users in "movies" subscription have access to content in "basic" plus some movies (ID 1)
- users in "sports" subscription have access to package "basic" plus some sports content (ID 2)
- users in "full" subscription have access to all (ID 3)
This is a simplified version, there are a few more susbcriptions, but the thing it’s there is not a progression in subscriptions, where each subscription contains the previous one plus more content. So when searching for content for a user I cannot use just less than to check whether is available for that user or not, I cannot do:
SELECT * from content where content_susbcription<=$userSubscriptionID
What I can do is:
a) Use another 1-n table where each content is matched with each subscription, that is, if contentID 8983 can appear in subscription 3, there would be just a row 8983/3, but if it can appear in 0, 1 and 3, then there would be three rows (8983/0, 8983/1, 8983/3). I can then
SELECT * FROM content c, content_subscriptions s where c.contentID=s.contentID and s.subscription = $userSubscriptionID
b) in the content table, add a new field, numeric, with a binary representation of the subscription where that content is available, where each bit means it’s available for that subscription. If ID 8983 appears in suscription ID 3 only, then the value of that field would be 8 (2^3, binary 00001000), and if it can appear in suscriptions 0, 1 and 3, then the value would be 11 (2^3 + 2^1 + 2^0, 00001011)
Then I can search like this:
SELECT * FROM content where BIT_AND(content_susbcription, $userSubscriptionID)<>0
The question is: which performance can I expect from binary operations compared to auxiliar table, assuming that in both cases I have an index on the search field. Would the index help at all with BIT_AND?
Consider the content table would have a large number of records (~100.000). I’m looking for the faster solution, as concurrent users may affect performance if I choose the wrong approach.
2
Answers
If you use the bitwise operation, it will force the query to do a table-scan. That is, an index won’t help optimize the query.
If you join to the
content_subscription
table, you may create an index in that table on(subscriptionID, contentID)
.I also recommend to use modern
JOIN
syntax instead of the old comma-style joins. It has no effect on performance, but the modern syntax allows for outer joins, so it’s a good habit to use it consistently even for inner joins like yours.If your query is fetching more than about 20% of the table, then any solution will probably ignore indexes and simply do a full scan. In this case, the bin-and solution is probably best since it is the most compact.
However, since 20% of 100K is 20K rows, I suspect you have left out something — maybe there are thousands of users in this 100K-row tab;e? And the filtering is more like
WHERE user_id = 123 AND ((test for subscriptions))
? In this case,INDEX(user_id)
would be the most helpful. (And ignore the performance of the rest.)If you are really doing something like
to find all users subscribed to something, there may be a better solution.
Update your Question to include the real query and some statistics on how many rows are likely to be returned.