I run into this situation from time to time, where I want to take a list of things and see if any of them exist in another list.
If you have a single number, it’s simple enough to use a where clause to say something like:
where topic_id IN (select topic_id from articles where article_id = 101)
But what if, say, I have a user who’s selected a list of categories they’re interested in (1,2,7,25,66), and I want to pull the most recent 15 articles that fall under any of those topics, and a given article can belong to 1 or more topics? It doesn’t work to say:
where (select topic_id from articles) IN (1,2,7,25,66)
order by topic_date desc
It’s possible to loop over the right list in a scripting language, doing a separate query for 1,2,7,25, and 66, but that could get very inefficient if it were a longer list. Is there a relatively efficient way to do something like this in a single SQL query?
sample data:
articles
article_id Title Text
1 Today's IC flavor Today's IC will be strawberry.
2 Looking Sunny Today's going to be hot!
3 Good day to fly No flights have been delayed today, because the weather is great!
4 Game on! There will be an epic baseball game today
5 Hockey too? There will be a hockey game today, indoors.
6 Special Aftergame Event In Tokyo, there will be an ice cream party after the baseball game. Woot!
topics
topic_id topic_name
1 Food
2 Travel
3 Weather
4 sports
link_article_topic
article_id topic_id
1 1
2 3
3 2
3 3
4 4
5 4
6 1
6 2
6 4
Notice that some articles have records for more than 1 topic in link_article_topic. This is simplified – my real world case has a much more extensive list of topics and articles, and my real world user has a much longer list of preferred topics, but it’s the same concept.
So say my user has preset their desired topics to (2,4), or Travel and Sports. Say also that the articles table has a date field for when the article was published. I want to get the 3 most recent articles, in descending date order, that belong to the Travel and Sports topics.
In my real-world case, I tried a query using EXISTS similar to below, but it timed out. Please correct me if I have the syntax wrong, but I believe the "a." is necessary for the WHERE in that EXISTS statement:
SELECT a.article_id, a.Title, a.Text
FROM Articles a
WHERE a.extraColumn1 IN (list)
AND a.extraColumn2 IN (anotherList)
AND EXISTS (SELECT article_id from link_article_topic
WHERE article_id = a.article_ID
AND topic_ID IN (2,4))
ORDER BY DATE DESC
2
Answers
Since you’re using a mapping table to associate topics with articles (an excellent choice) you can use an INNER JOIN and a DISTINCT phase to remove duplicates
EDIT NOTE: This has been re-written to target MySQL at OP’s request, here is a link to an online MySQL interpreter with the below code snippet
https://extendsclass.com/mysql/825408f
BTW, the output of this is as follows
This reads like relational division.
As a starter, the following query gives you the ids of all articles that have both topics:
With this information at hand, we can bring the corresponding articles and sort them (assuming a column like
article_date
, which is not showing in your sample data):You did not tag your database; this syntax is quite portable should run on most databases – apart from
limit
, which is somehow vendor-specific, and that you might need to adapt.