skip to Main Content

If the total number of records is x and the count of "like" query is y then the count of "not like" query should be x – y

I am getting "x" as the count of "not like" query as shown below:

SELECT COUNT(DISTINCT(b.word)) 
FROM "hunspell"."oscar2_sorted" AS b

Total Count: 9597651

SELECT COUNT(distinct(b.word))
FROM "hunspell"."oscar2_sorted" as b 
INNER JOIN invalidswar AS a 
ON b.word LIKE (CONCAT('%', a.word,'%'))

Like count: 73116

SELECT COUNT(distinct(b.word)) 
FROM "hunspell"."oscar2_sorted" AS b 
INNER JOIN invalidswar AS a
ON b.word NOT LIKE (CONCAT('%', a.word,'%'))

Not Like count: 9597651

Expected: 9524535

I am not sure what am I missing.


Update:

Left join count is close to expected, but still does not look correct.

SELECT COUNT(DISTINCT(b.word))
FROM "hunspell"."oscar2_sorted" AS b 
LEFT JOIN (SELECT DISTINCT(b.word) AS dword 
           FROM "hunspell"."oscar2_sorted" AS b 
           INNER JOIN invalidswar AS a 
           ON b.word LIKE (CONCAT('%', a.word,'%'))) AS d 
ON d.dword = b.word 
WHERE d.dword IS NULL

left join count: 9536539


update 2:

a difference of 12004 is traced back to the difference of how like and regexp_like is executed.

SELECT count(distinct(b.word)) 
          FROM "hunspell"."oscar2_sorted" as b
         INNER JOIN invalidswar AS a 
            ON regexp_like(b.word, a.word)

regex like count: 61112

2

Answers


  1. SELECT COUNT(word)
      FROM (SELECT word 
              FROM "hunspell"."oscar2_sorted"
            EXCEPT DISTINCT
           (SELECT b.word
              FROM "hunspell"."oscar2_sorted" as b
             INNER JOIN invalidswar AS a 
                ON regexp_like(b.word, a.word)))
    

    See: EXCEPT clause and regexp_like

    Login or Signup to reply.
  2. WITH
      invalid_check AS
    (
      SELECT
        o.word,
        CASE WHEN
          EXISTS (
            SELECT *
              FROM invalidswar AS i
             WHERE o.word LIKE CONCAT('%', i.word,'%')
          )
        THEN
          1
        ELSE
          0
        END
          AS is_invalid
      FROM
        "hunspell"."oscar2_sorted"   AS o
      GROUP BY
        o.word
    )
    SELECT
      COUNT(*)            AS all_words,
      SUM(is_invalid)     AS invalid_words,
      SUM(1-is_invalid)   AS valid_words
    FROM
      valid_check
    

    Assuming the above matches your expectations, just counting valid words could then be…

    SELECT
      COUNT(DISTINCT o.word)
    FROM
      "hunspell"."oscar2_sorted"   AS o
    WHERE
      NOT EXISTS (
        SELECT *
          FROM invalidswar AS i
         WHERE o.word LIKE CONCAT('%', i.word,'%')
      )
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search