I have a table package_scans
with fields package_id
and scan_id
, and another table scan_content
with scan_id
and epc
.
There is a 1-N relation between the tables, so for each package_scan
there are multiple entries in scan_content
.
I am trying to find all package scans, where there exists another scan but with all different content.
Example (for sake of simplicity all scans have exactly 2 epc
, could be anywhere from 1-*):
Table package_scans
:
package_id | scan_id |
---|---|
1 | 111 |
1 | 112 |
2 | 221 |
2 | 222 |
3 | 331 |
4 | 441 |
4 | 442 |
4 | 443 |
Table scan_content
:
scan_id | epc |
---|---|
111 | a1 |
111 | a2 |
112 | a3 |
112 | a4 |
221 | b1 |
221 | b2 |
222 | b2 |
222 | b3 |
331 | c1 |
331 | c2 |
441 | d1 |
441 | d2 |
442 | d1 |
442 | d2 |
443 | d3 |
443 | d4 |
The only valid result would be package_id
1, as this package has two scans (111 and 112) with completely different epc
.
Package 2 is not relevant to me, it also has two scans (221 and 222) but there is an overlap in the epc
(both have b2).
Package 3 not relevant to me as it only has a single scan.
Package 4 does have an overlap between scan 441 and 442, but scan 443 is all new and therefore package 4 is relevant and needs to be part of the result set.
I tried with different versions of EXISTS
and NOT EXISTS
, but I only get to the point where I get both package_id
1 and 2 as a result, not only 1.
Here is my best approach:
SELECT package_id FROM package_scans ps1
JOIN scan_content sc1 ON ps1.scan_id = sc1.scan_id
WHERE EXISTS (
SELECT 1 FROM package_scans ps2
WHERE ps1.package_id = ps2.package_id
AND ps1.scan_id != ps2.scan_id
AND NOT EXISTS (
SELECT 1 FROM scan_content sc2
WHERE ps2.scan_id = sc2.scan_id
AND sc1.epc = sc2.epc
)
)
Here is my thoughts behind the my SQL:
There is a package_scan, where there exists another package_scan for the same package_id but a different scan_id. This second scan does not have a single epc that matches the epcs from the first scan.
Sadly this SQL results in both package_id 1 and 2.
Thanks in advance for any tips!
2
Answers
One option is to join your tables use Count() Over() analytic functions to calculate repeating package_id(s) and epc(s) and select rows where there are more then 1 specific package_id having all different epc(s) …
See the fiddle here.
This is a type of Relational Division Without Remainder, with the twist that you have multiple divisors, and that you need the dividend to be above 1.
There are a number of solutions. Many would end up self-joining multiple times, or use window functions, but I don’t think it’s necessary here. In this case, I think an efficient method might be as follows:
Group by
package_id
andepc
, taking the count perepc
, as well as the min and maxscan_id
.Then group again just by
package_id
, checking that we the min and maxscan_id
are different (and therefore at least two scans), and that the count for everyepc
is 1 (and therefore there are no overlaps anywhere in thepackage_id
.An alternative more concise but more obscure syntax for the final condition is:
db<>fiddle