skip to Main Content

I run into this situation from time to time, where I want to take a list of things and see if any of them exist in another list.

If you have a single number, it’s simple enough to use a where clause to say something like:

where topic_id IN (select topic_id from articles where article_id = 101)

But what if, say, I have a user who’s selected a list of categories they’re interested in (1,2,7,25,66), and I want to pull the most recent 15 articles that fall under any of those topics, and a given article can belong to 1 or more topics? It doesn’t work to say:

where (select topic_id from articles) IN (1,2,7,25,66)
order by topic_date desc

It’s possible to loop over the right list in a scripting language, doing a separate query for 1,2,7,25, and 66, but that could get very inefficient if it were a longer list. Is there a relatively efficient way to do something like this in a single SQL query?

sample data:

articles

article_id     Title                         Text
1              Today's IC flavor             Today's IC will be strawberry.    
2              Looking Sunny                 Today's going to be hot!
3              Good day to fly               No flights have been delayed today, because the weather is great!
4              Game on!                      There will be an epic baseball game today
5              Hockey too?                   There will be a hockey game today, indoors.
6              Special Aftergame Event       In Tokyo, there will be an ice cream party after the baseball game. Woot!

topics

topic_id       topic_name
1              Food
2              Travel
3              Weather
4              sports

link_article_topic

article_id        topic_id
1                 1
2                 3
3                 2
3                 3
4                 4
5                 4
6                 1
6                 2
6                 4

Notice that some articles have records for more than 1 topic in link_article_topic. This is simplified – my real world case has a much more extensive list of topics and articles, and my real world user has a much longer list of preferred topics, but it’s the same concept.

So say my user has preset their desired topics to (2,4), or Travel and Sports. Say also that the articles table has a date field for when the article was published. I want to get the 3 most recent articles, in descending date order, that belong to the Travel and Sports topics.

In my real-world case, I tried a query using EXISTS similar to below, but it timed out. Please correct me if I have the syntax wrong, but I believe the "a." is necessary for the WHERE in that EXISTS statement:

SELECT a.article_id, a.Title, a.Text
FROM Articles a
WHERE a.extraColumn1 IN (list)
    AND a.extraColumn2 IN (anotherList)
    AND EXISTS (SELECT article_id from link_article_topic
                 WHERE article_id = a.article_ID
                   AND topic_ID IN (2,4))
    ORDER BY DATE DESC

2

Answers


  1. Since you’re using a mapping table to associate topics with articles (an excellent choice) you can use an INNER JOIN and a DISTINCT phase to remove duplicates

    EDIT NOTE: This has been re-written to target MySQL at OP’s request, here is a link to an online MySQL interpreter with the below code snippet
    https://extendsclass.com/mysql/825408f

    /*BEGIN sample data*/
    with cteTopics as (
            SELECT * 
            FROM (VALUES ROW(1, 'Art'), ROW(2, 'Music'), ROW(3, 'Poetry')
                , ROW(4, 'Literature'), ROW(5, 'Math'), ROW(6, 'Physics')
                , ROW(7, 'Photography'), ROW(8, 'News'), ROW(9, 'Crafts')
                , ROW(10, 'Sports'), ROW(11, 'History'), ROW(12, 'Tech')
            ) as tpc(TopicID, TopicName)
        ), cteArticles as (
            SELECT * 
            FROM (VALUES ROW(1, 'A brief history of golf', '2020-01-14')
                , ROW(2, 'Stop action pickleball photos', '2023-06-10')
                , ROW(3, 'The physics of artistic photographs', '2022-07-15')
                , ROW(4, 'Songs of War and Peace', '2022-09-21')
                , ROW(5, 'Writing tech reviews', '2020-05-05')
            ) as art(ArticleID, ArticleTitle, PublishDate)
        ), cteArticleTags as (
            SELECT * 
            FROM (VALUES ROW(1, 11), ROW(1, 10)
                , ROW(2, 7), ROW(2, 10)
                , ROW(3, 6), ROW(3, 1), ROW(3, 7)
                , ROW(4, 2), ROW(4, 4)
                , ROW(5, 12), ROW(5, 7), ROW(5, 4)
            ) as attr(ArticleID, TopicID)
        ), cteUserInterests as (
            SELECT * 
            FROM (VALUES ROW(1, 2), ROW(1, 7), ROW(1, 10)
                , ROW(2, 5), ROW(2, 10)
                , ROW(3, 5), ROW(3, 6), ROW(3, 12)
            ) as attr(UserID, TopicID)
        )/*END of sample data
        Now build solution, look for the set of articles that match for each user*/
        , cteMatch as (
            SELECT DISTINCT U.UserID, A.ArticleID 
            FROM cteUserInterests as U 
                INNER JOIN cteArticleTags as T on U.TopicID = T.TopicID 
                INNER JOIN cteArticles as A on A.ArticleID = T.ArticleID 
        )
        SELECT M.UserID, A.*
        FROM cteMatch as M INNER JOIN cteArticles as A on A.ArticleID = M.ArticleID 
        ORDER BY M.UserID, A.PublishDate DESC
    

    BTW, the output of this is as follows

    UserID ArticleID ArticleTitle PublishDate
    1 2 Stop action pickleball photos 2023-06-10
    1 4 Songs of War and Peace 2022-09-21
    1 3 The physics of artistic photographs 2022-07-15
    1 5 Writing tech reviews 2020-05-05
    1 1 A brief history of golf 2020-01-14
    2 2 Stop action pickleball photos 2023-06-10
    2 1 A brief history of golf 2020-01-14
    3 3 The physics of artistic photographs 2022-07-15
    3 5 Writing tech reviews 2020-05-05
    Login or Signup to reply.
  2. I want to get the 3 most recent articles, in descending date order, that belong to the Travel and Sports topics

    This reads like relational division.

    As a starter, the following query gives you the ids of all articles that have both topics:

    select l.article_id
    from link_articles_topics l
    inner join topics t on t.topic_id = l.topic_id
    where t.topic_name in ('Travel', 'Sports')
    group by l.article_id
    having count(*) = 2
    

    With this information at hand, we can bring the corresponding articles and sort them (assuming a column like article_date, which is not showing in your sample data):

    select a.*
    from articles a
    inner join (
        select l.article_id
        from link_articles_topics l
        inner join topics t on t.topic_id = l.topic_id
        where t.topic_name in ('Travel', 'Sports')
        group by l.article_id
        having count(*) = 2
    ) l on l.article_id = a.article_id
    order by a.article_date desc
    limit 3
    

    You did not tag your database; this syntax is quite portable should run on most databases – apart from limit, which is somehow vendor-specific, and that you might need to adapt.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search