skip to Main Content

I have a table package_scans with fields package_id and scan_id, and another table scan_content with scan_id and epc.

There is a 1-N relation between the tables, so for each package_scan there are multiple entries in scan_content.

I am trying to find all package scans, where there exists another scan but with all different content.

Example (for sake of simplicity all scans have exactly 2 epc, could be anywhere from 1-*):

Table package_scans:

package_id scan_id
1 111
1 112
2 221
2 222
3 331
4 441
4 442
4 443

Table scan_content:

scan_id epc
111 a1
111 a2
112 a3
112 a4
221 b1
221 b2
222 b2
222 b3
331 c1
331 c2
441 d1
441 d2
442 d1
442 d2
443 d3
443 d4

The only valid result would be package_id 1, as this package has two scans (111 and 112) with completely different epc.

Package 2 is not relevant to me, it also has two scans (221 and 222) but there is an overlap in the epc (both have b2).
Package 3 not relevant to me as it only has a single scan.

Package 4 does have an overlap between scan 441 and 442, but scan 443 is all new and therefore package 4 is relevant and needs to be part of the result set.

I tried with different versions of EXISTS and NOT EXISTS, but I only get to the point where I get both package_id 1 and 2 as a result, not only 1.

Here is my best approach:

SELECT package_id FROM package_scans ps1 
JOIN scan_content sc1 ON ps1.scan_id = sc1.scan_id
WHERE EXISTS (
      SELECT 1 FROM package_scans ps2
      WHERE ps1.package_id = ps2.package_id
      AND ps1.scan_id != ps2.scan_id
      AND NOT EXISTS (
            SELECT 1 FROM scan_content sc2
            WHERE ps2.scan_id = sc2.scan_id
            AND sc1.epc = sc2.epc
            )
      )

Here is my thoughts behind the my SQL:
There is a package_scan, where there exists another package_scan for the same package_id but a different scan_id. This second scan does not have a single epc that matches the epcs from the first scan.

Sadly this SQL results in both package_id 1 and 2.

Thanks in advance for any tips!

2

Answers


  1. One option is to join your tables use Count() Over() analytic functions to calculate repeating package_id(s) and epc(s) and select rows where there are more then 1 specific package_id having all different epc(s) …

    WITH    /*  S a m p l e    D a t a  */
      package_scans ( package_id, scan_id ) AS 
        ( Select 1, 111 Union All 
          Select 1, 112 Union All 
          Select 2, 221 Union All 
          Select 2, 222 Union All 
          Select 3, 331 
        ),
      scan_content ( scan_id, epc ) AS 
        ( Select 111, 'a1' Union All 
          Select 111, 'a2' Union All 
          Select 112, 'a3' Union All 
          Select 112, 'a4' Union All 
          Select 221, 'b1' Union All
          Select 221, 'b2' Union All 
          Select 222, 'b2' Union All
          Select 222, 'b3' Union All 
          Select 331, 'c1' Union All
          Select 331, 'c2' 
        )
    
    /*    M a i n    S Q L :    */
    SELECT p.package_id
    FROM   ( Select     ps.package_id,  
                        Count(ps.package_id) Over(Partition By ps.package_id) /
                        Count(ps.package_id) Over(Partition By ps.scan_id) as ps_count, 
                        Count(s.epc) Over(Partition By ps.package_id, s.epc) as s_count
            From       package_scans ps
            Inner Join scan_content s ON( s.scan_id = ps.scan_id )
          ) p
    WHERE     ps_count > 1
    GROUP BY  p.package_id
    HAVING    Max(p.s_count) = 1
    
    /*  R e s u l t : 
    package_id 
    ----------
             1    */
    

    See the fiddle here.

    Login or Signup to reply.
  2. This is a type of Relational Division Without Remainder, with the twist that you have multiple divisors, and that you need the dividend to be above 1.

    There are a number of solutions. Many would end up self-joining multiple times, or use window functions, but I don’t think it’s necessary here. In this case, I think an efficient method might be as follows:

    Group by package_id and epc, taking the count per epc, as well as the min and max scan_id.

    Then group again just by package_id, checking that we the min and max scan_id are different (and therefore at least two scans), and that the count for every epc is 1 (and therefore there are no overlaps anywhere in the package_id.

    SELECT
      ps.package_id
    FROM (
        SELECT
          ps.package_id,
          COUNT(*) AS count_per_epc,
          MIN(sc.scan_id) AS min_scan,
          MAX(sc.scan_id) AS max_scan
        FROM package_scans ps
        JOIN scan_content sc ON sc.scan_id = ps.scan_id
        GROUP BY
          ps.package_id,
          sc.epc
    ) ps
    GROUP BY
      ps.package_id
    HAVING MIN(ps.min_scan) <> MAX(ps.max_scan)  -- at least two scans
       AND COUNT(CASE WHEN ps.count_per_epc > 1 THEN 1 END) = 0  -- none;
    

    An alternative more concise but more obscure syntax for the final condition is:

       AND COUNT(NULLIF(ps.count_per_epc, 1)) = 0;
    

    db<>fiddle

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search