Postgresql - SQL to find parents where ALL children are different

Budenzauber
August 7, 2024
190 views
0 votes
2 Answers

I have a table package_scans with fields package_id and scan_id, and another table scan_content with scan_id and epc.

There is a 1-N relation between the tables, so for each package_scan there are multiple entries in scan_content.

I am trying to find all package scans, where there exists another scan but with all different content.

Example (for sake of simplicity all scans have exactly 2 epc, could be anywhere from 1-*):

Table package_scans:

package_id	scan_id
1	111
1	112
2	221
2	222
3	331
4	441
4	442
4	443

Table scan_content:

scan_id	epc
111	a1
111	a2
112	a3
112	a4
221	b1
221	b2
222	b2
222	b3
331	c1
331	c2
441	d1
441	d2
442	d1
442	d2
443	d3
443	d4

The only valid result would be package_id 1, as this package has two scans (111 and 112) with completely different epc.

Package 2 is not relevant to me, it also has two scans (221 and 222) but there is an overlap in the epc (both have b2).
Package 3 not relevant to me as it only has a single scan.

Package 4 does have an overlap between scan 441 and 442, but scan 443 is all new and therefore package 4 is relevant and needs to be part of the result set.

I tried with different versions of EXISTS and NOT EXISTS, but I only get to the point where I get both package_id 1 and 2 as a result, not only 1.

Here is my best approach:

SELECT package_id FROM package_scans ps1 
JOIN scan_content sc1 ON ps1.scan_id = sc1.scan_id
WHERE EXISTS (
      SELECT 1 FROM package_scans ps2
      WHERE ps1.package_id = ps2.package_id
      AND ps1.scan_id != ps2.scan_id
      AND NOT EXISTS (
            SELECT 1 FROM scan_content sc2
            WHERE ps2.scan_id = sc2.scan_id
            AND sc1.epc = sc2.epc
            )
      )

Here is my thoughts behind the my SQL:
There is a package_scan, where there exists another package_scan for the same package_id but a different scan_id. This second scan does not have a single epc that matches the epcs from the first scan.

Sadly this SQL results in both package_id 1 and 2.

Thanks in advance for any tips!

Tags: postgresql sql

Answers

One option is to join your tables use Count() Over() analytic functions to calculate repeating package_id(s) and epc(s) and select rows where there are more then 1 specific package_id having all different epc(s) …

WITH    /*  S a m p l e    D a t a  */
  package_scans ( package_id, scan_id ) AS 
    ( Select 1, 111 Union All 
      Select 1, 112 Union All 
      Select 2, 221 Union All 
      Select 2, 222 Union All 
      Select 3, 331 
    ),
  scan_content ( scan_id, epc ) AS 
    ( Select 111, 'a1' Union All 
      Select 111, 'a2' Union All 
      Select 112, 'a3' Union All 
      Select 112, 'a4' Union All 
      Select 221, 'b1' Union All
      Select 221, 'b2' Union All 
      Select 222, 'b2' Union All
      Select 222, 'b3' Union All 
      Select 331, 'c1' Union All
      Select 331, 'c2' 
    )

/*    M a i n    S Q L :    */
SELECT p.package_id
FROM   ( Select     ps.package_id,  
                    Count(ps.package_id) Over(Partition By ps.package_id) /
                    Count(ps.package_id) Over(Partition By ps.scan_id) as ps_count, 
                    Count(s.epc) Over(Partition By ps.package_id, s.epc) as s_count
        From       package_scans ps
        Inner Join scan_content s ON( s.scan_id = ps.scan_id )
      ) p
WHERE     ps_count > 1
GROUP BY  p.package_id
HAVING    Max(p.s_count) = 1

/*  R e s u l t : 
package_id 
----------
         1    */

See the fiddle here.

- Charlieface
- August 6, 2024 at 1:30 pm
- 0 votes
0
This is a type of Relational Division Without Remainder, with the twist that you have multiple divisors, and that you need the dividend to be above 1.

There are a number of solutions. Many would end up self-joining multiple times, or use window functions, but I don’t think it’s necessary here. In this case, I think an efficient method might be as follows:

Group by package_id and epc, taking the count per epc, as well as the min and max scan_id.

Then group again just by package_id, checking that we the min and max scan_id are different (and therefore at least two scans), and that the count for every epc is 1 (and therefore there are no overlaps anywhere in the package_id.
```
SELECT
  ps.package_id
FROM (
    SELECT
      ps.package_id,
      COUNT(*) AS count_per_epc,
      MIN(sc.scan_id) AS min_scan,
      MAX(sc.scan_id) AS max_scan
    FROM package_scans ps
    JOIN scan_content sc ON sc.scan_id = ps.scan_id
    GROUP BY
      ps.package_id,
      sc.epc
) ps
GROUP BY
  ps.package_id
HAVING MIN(ps.min_scan) <> MAX(ps.max_scan)  -- at least two scans
   AND COUNT(CASE WHEN ps.count_per_epc > 1 THEN 1 END) = 0  -- none;
```
An alternative more concise but more obscure syntax for the final condition is:
```
   AND COUNT(NULLIF(ps.count_per_epc, 1)) = 0;
```
db<>fiddle
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.

Postgresql – SQL to find parents where ALL children are different

Answers