skip to Main Content

I have two tables keywords and posts in my PieCloudDB Database.

Each topic can be expressed by one or more keywords. If a keyword of a certain topic exists in the content of a post (case insensitive) then the post has this topic.

For example:

topic_id keyword
1 basketball
2 music
3 food
4 war
post_id content
1 A typhoon warning has been issued in southern Japan
2 We are going to play neither basketball nor volleyball
3 I am indulging in both the delightful music and delectable food
4 That basketball player fouled again

Now I want to find the topics of each post according to the following rules:

  • If the post does not have keywords from any topic, its topic should be "Vague!".

  • If the post has at least one keyword of any topic, its topic should be a string of the IDs of its topics sorted in ascending order and separated by commas ‘,’.

For the above example data, the results should be:

post_id topics
1 Vague!
2 1
3 2,3
4 1
SELECT post_id, COALESCE(array_to_string(array_agg(DISTINCT topic_id ORDER BY topic_id), ','), 'Vague!') AS topic
FROM (
    SELECT p.post_id, k.topic_id
    FROM Posts p 
    LEFT JOIN Keywords k 
    ON LOWER(content) LIKE '% ' || keyword || ' %' OR content LIKE keyword || ' %' OR content LIKE '% ' || keyword
  ) a
GROUP BY post_id
ORDER BY post_id

I tried this query but the results I got were not exactly correct. I don’t know why the output of post 1 is null:

post_id topics
1
2 1
3 2,3
4 1

Can anyone give me a correct answer?(If you don’t know the database I use, you can use PostgreSQL instead, thanks)

3

Answers


  1. You’re not distinguishing between NULL and the empty string ('') – this first query shows you where the problem is, the second shows a solution (all of the code below is available on the fiddle here):

    SELECT 
      post_id, 
      
      COALESCE
      (
        array_to_string(array_agg(DISTINCT topic_id ORDER BY topic_id), ','), 
        'Vague!'
      ) AS topic,
    
      LENGTH(array_to_string(array_agg(DISTINCT topic_id ORDER BY topic_id), ',')),
    
      LENGTH(NULL) AS "Length of NULL",  -- <<== LENGTH(NULL) is always NULL, and NOT zero 
    
      PG_TYPEOF(NULL) AS "Type of NULL",
      
      PG_TYPEOF
      (
        array_to_string(array_agg(DISTINCT topic_id ORDER BY topic_id), ',')
      )
    FROM 
    (
      SELECT p.post_id, k.topic_id
      FROM post p 
      LEFT JOIN keyword k 
        ON LOWER(content) LIKE '% ' || topic || ' %' 
        OR content LIKE topic || ' %' OR content LIKE '% ' || topic
    ) a
    GROUP BY post_id
    ORDER BY post_id;
    

    Note the use of the PG_TYPEOF() function (manual) – can be handy for debugging!

    Result:

    post_id topic length Length of NULL Type of NULL pg_typeof
          1     0                  null      unknown      text
          2     1      1           null      unknown      text
          3   2,3      3           null      unknown      text
          4     1      1           null      unknown      text
    

    So, the correct approach is:

    SELECT 
      post_id, 
      CASE
        WHEN
          (array_to_string(array_agg(DISTINCT topic_id ORDER BY topic_id), ',') IS NULL) 
            THEN 'Vague!'
        WHEN
          (array_to_string(array_agg(DISTINCT topic_id ORDER BY topic_id), ',') = '') 
            THEN 'Vague!'      
        ELSE
          array_to_string(array_agg(DISTINCT topic_id ORDER BY topic_id), ',')
      END AS topic
    FROM 
    (
      SELECT p.post_id, k.topic_id
      FROM post p 
      LEFT JOIN keyword k 
        ON LOWER(content) LIKE '% ' || topic || ' %' 
        OR content LIKE topic || ' %' OR content LIKE '% ' || topic
    ) a
    GROUP BY post_id
    ORDER BY post_id;
    

    Result:

    post_id    topic
          1   Vague!
          2        1
          3      2,3
          4        1
    
    Login or Signup to reply.
  2. You can simplify your query and just use STRING_AGG, so you get rid of your subquery. By the way, please take care to always put the table alias before the column name, not only sometimes.

    SELECT 
      p.post_id, 
      COALESCE(STRING_AGG(k.topic_id::varchar,',' 
                 ORDER BY k.topic_id), 
               'Vague!') AS topic
      FROM Posts p 
      LEFT JOIN Keywords k 
        ON LOWER(p.content) LIKE '% ' || k.keyword || ' %' 
           OR p.content LIKE k.keyword || ' %' 
           OR p.content LIKE '% ' || k.keyword
      GROUP BY p.post_id
      ORDER BY p.post_id;
    

    See this db<>fiddle with your data.

    The fiddle also shows you could use CASE rather than COALESCE in your orignal query which would solve your issue too. The issue was COALESCE replaces NULL values, but you got an empty string, not NULL. Anyway, your previous query is far too complicated for your use case.

    You might also be able to simplify those LIKE conditions and rather use a Regex approach. Since I don’t use Pie DB and am not sure about regex functionality there, I leave this part up to you.

    Login or Signup to reply.
  3. To match keywords more accurately, you can try to use regex on your pieclouddb database.(The following query was tested on PostgreSQL)

    SELECT p.post_id,
        CASE
            WHEN COUNT(k.topic_id) = 0 THEN 'Vague!'
            ELSE STRING_AGG(DISTINCT k.topic_id::TEXT, ',' ORDER BY k.topic_id::TEXT)
        END AS topics
    FROM posts p
    LEFT JOIN keywords k ON LOWER(p.content) ~* CONCAT('(^|W)', LOWER(k.keyword), '(W|$)')
    GROUP BY p.post_id
    ORDER BY p.post_id
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search