Let’s say I have two tables (from uploaded csv files) and I want to do a diff based on an id+territory in the new file that wasn’t in the old file. The easiest way to do this is something like:
SELECT id, territory FROM this_week EXCEPT SELECT id, territory FROM last_week
However, what I’m trying to do is get ALL fields (in both tables — one row per key) that are generated by that difference. How could this be done?
Either postgres or bigquery is fine. Both have the EXCEPT
set op.
An example with data from Erwin’s answer:
WITH this_week (id,territory,name,other) AS (VALUES(1,'us','titanic','uhd'),(22,'us','spider','hd'),(3,'fr','new','hd')),
last_week (id,territory,name,other) AS (VALUES(1,'us','titanic','uhd'),(2,'us','spider','hd'))
SELECT * -- all columns of "this_week"
FROM this_week t
WHERE NOT EXISTS (
SELECT * FROM last_week l
WHERE t.id = l.id
AND t.territory = l.territory
);
2
Answers
Use
NOT EXISTS
:And index on
last_week (id, territory)
would typically help performance (a lot).This shows all columns of
this_week
.I don’t see the point in adding columns of
last_week
, which would be empty (null
) by definition of the query if you’d left-join.Basics:
Note a subtle difference:
EXCEPT
(when used withoutALL
) folds duplicates. This query does not. You may want one or the other. Typically, you want this.You can achieve the same if you use
SET OPERATIONS
.However, by looking at your original question you said,
"However, what I'm trying to do is get ALL fields (in both tables -- one row per key) that are generated by that difference. How could this be done?"
if your statement meant by to get unique rows from both the tables then you could useUNION SET OPERATION
.