If the total number of records is x and the count of "like" query is y then the count of "not like" query should be x – y
I am getting "x" as the count of "not like" query as shown below:
SELECT COUNT(DISTINCT(b.word))
FROM "hunspell"."oscar2_sorted" AS b
Total Count: 9597651
SELECT COUNT(distinct(b.word))
FROM "hunspell"."oscar2_sorted" as b
INNER JOIN invalidswar AS a
ON b.word LIKE (CONCAT('%', a.word,'%'))
Like count: 73116
SELECT COUNT(distinct(b.word))
FROM "hunspell"."oscar2_sorted" AS b
INNER JOIN invalidswar AS a
ON b.word NOT LIKE (CONCAT('%', a.word,'%'))
Not Like count: 9597651
Expected: 9524535
I am not sure what am I missing.
Update:
Left join count is close to expected, but still does not look correct.
SELECT COUNT(DISTINCT(b.word))
FROM "hunspell"."oscar2_sorted" AS b
LEFT JOIN (SELECT DISTINCT(b.word) AS dword
FROM "hunspell"."oscar2_sorted" AS b
INNER JOIN invalidswar AS a
ON b.word LIKE (CONCAT('%', a.word,'%'))) AS d
ON d.dword = b.word
WHERE d.dword IS NULL
left join count: 9536539
update 2:
a difference of 12004 is traced back to the difference of how like and regexp_like is executed.
SELECT count(distinct(b.word))
FROM "hunspell"."oscar2_sorted" as b
INNER JOIN invalidswar AS a
ON regexp_like(b.word, a.word)
regex like count: 61112
2
Answers
See: EXCEPT clause and regexp_like
Assuming the above matches your expectations, just counting valid words could then be…