skip to Main Content

I have a case where users may have access to some content with subscriptions, it’s not video streaming but I will write an example with that for easier understanding:

  • users in "basic" subscription have access to some basic videos (ID 0)
  • users in "movies" subscription have access to content in "basic" plus some movies (ID 1)
  • users in "sports" subscription have access to package "basic" plus some sports content (ID 2)
  • users in "full" subscription have access to all (ID 3)

This is a simplified version, there are a few more susbcriptions, but the thing it’s there is not a progression in subscriptions, where each subscription contains the previous one plus more content. So when searching for content for a user I cannot use just less than to check whether is available for that user or not, I cannot do:

SELECT * from content where content_susbcription<=$userSubscriptionID

What I can do is:

a) Use another 1-n table where each content is matched with each subscription, that is, if contentID 8983 can appear in subscription 3, there would be just a row 8983/3, but if it can appear in 0, 1 and 3, then there would be three rows (8983/0, 8983/1, 8983/3). I can then

SELECT * FROM content c, content_subscriptions s where c.contentID=s.contentID and s.subscription = $userSubscriptionID

b) in the content table, add a new field, numeric, with a binary representation of the subscription where that content is available, where each bit means it’s available for that subscription. If ID 8983 appears in suscription ID 3 only, then the value of that field would be 8 (2^3, binary 00001000), and if it can appear in suscriptions 0, 1 and 3, then the value would be 11 (2^3 + 2^1 + 2^0, 00001011)

Then I can search like this:

SELECT * FROM content where BIT_AND(content_susbcription, $userSubscriptionID)<>0

The question is: which performance can I expect from binary operations compared to auxiliar table, assuming that in both cases I have an index on the search field. Would the index help at all with BIT_AND?

Consider the content table would have a large number of records (~100.000). I’m looking for the faster solution, as concurrent users may affect performance if I choose the wrong approach.

2

Answers


  1. If you use the bitwise operation, it will force the query to do a table-scan. That is, an index won’t help optimize the query.

    If you join to the content_subscription table, you may create an index in that table on (subscriptionID, contentID).

    I also recommend to use modern JOIN syntax instead of the old comma-style joins. It has no effect on performance, but the modern syntax allows for outer joins, so it’s a good habit to use it consistently even for inner joins like yours.

    SELECT * 
    FROM content c JOIN content_subscriptions s 
      ON c.contentID=s.contentID AND s.subscriptionID = $userSubscriptionID
    
    Login or Signup to reply.
  2. If your query is fetching more than about 20% of the table, then any solution will probably ignore indexes and simply do a full scan. In this case, the bin-and solution is probably best since it is the most compact.

    However, since 20% of 100K is 20K rows, I suspect you have left out something — maybe there are thousands of users in this 100K-row tab;e? And the filtering is more like WHERE user_id = 123 AND ((test for subscriptions))? In this case, INDEX(user_id) would be the most helpful. (And ignore the performance of the rest.)

    If you are really doing something like

    SELECT user_id 
        FROM ...
        WHERE ((test for subscriptions))
    

    to find all users subscribed to something, there may be a better solution.

    Update your Question to include the real query and some statistics on how many rows are likely to be returned.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search