I have two tables: table_a
and table_b
. Size-wise, table_b
is much bigger than table_a
.
There are fields(names) that are present in both the tables. The goal is to find the names (or the number of names when using count
): that appear in table_a
but do not appear in table_b
for the maximum date value.
We compare them using the names as you can see in the code below. Using LIKE CONCAT
as the name from table_a
could be a substring.
SELECT count(DISTINCT "Name")
FROM table_a
WHERE table_a."Date Collected" = (
SELECT MAX("Date Collected")
FROM table_a
)
AND NOT EXISTS (
SELECT *
FROM table_b
WHERE UPPER(table_b."computer_name") LIKE CONCAT('%', UPPER(table_a."Name"), '%')
AND table_b."date_collected" = (
SELECT MAX("date_collected")
FROM table_b
)
)
This query takes a long time to run and the values aren’t matching up- meaning we are not getting an accurate count or list.
Are there anyways this could be made more efficient and accurate?
2
Answers
Have you tried starting with table_a left join table_b? Did you check whether you really need to use UPPER? It feels like that might impede some automatic optimizations or use of indices/statistics. Do you have indices/statistics on the larger table?
Assuming Postgres, this should do what you describe:
CTE
a
gets all names from table A with the latest date in that table.CTE
b
gets all computer names from B with the latest date in that table.I use the fitting and typically efficient
WITH TIES
for the purpose. See:The outer query only keeps names from
a
that are not contained in any computer name inb
.Using the more elegant case-insensitive regular expression operator
~*
there.And the
LEFT JOIN
/IS NULL
technique. See:The pivotal point for performance (and whether that’s in fact the best query) will be fitting indexes for the query considering cardinalities, data distribution, and selectivity. None of which are clear from the question.
Also, consider proper, consistent, loser-case names for your DB. See: