skip to Main Content

I need to find supersets within an association table.

You can consider bag to be a collection of item records. The relationship between the two is tracked in bag_item.

I need to know which bags are supersets of other bags: containing ALL the items and possibly one or more additional items.

bag

id name
1 Bag A
2 Bag B
3 Bag C
4 Bag D

item

id name
1 Pencil
2 Pen
3 Eraser
4 Marker

bag_item (association table)

id bag_id item_id
1 1 1
2 1 2
3 2 1
4 2 2
5 2 3
6 2 4
7 3 1
8 3 4
9 4 1
10 4 2
11 4 3

Database: MySQL 8

I have "base" bag identifiers and I want to get "superset" bag identifiers that contain ALL the items in my "base" bag (and more items if present).

If I query for ALL bags, the response should look something like this:

base_bag_id superset_bag_id
1 2
1 4
3 2
4 2

But, I also want to be able to filter on the basis of the base_bag_id, e.g. 1 & 4.

I’ve tried something to the effect of the following:

select distinct base_bag_item.bag_id, superset_bag_item.bag_id
from bag_item as base_bag_item
join bag_item as superset_bag_item on superset_bag_item.item_id = base_bag_item.item_id
where superset_bag_item.bag_id != base_bag_item.bag_id
and base_bag_item.bag_id in (1, 4)
order by base_bag_item.bag_id, superset_bag_item.bag_id

However, this query will incorrectly produce bags that contain at least one of the quotes from the "base", e.g. base_bag_id 1 will have superset_bag_id 3 because both bags contain item_id 1.

How do I get my desired output?

3

Answers


  1. Try this solution please:

    SELECT 
        base_bag.bag_id AS base_bag_id, 
        superset_bag.bag_id AS superset_bag_id
    FROM 
        (SELECT bag_id FROM bag_item GROUP BY bag_id) base_bag
    JOIN 
        (SELECT bag_id FROM bag_item GROUP BY bag_id) superset_bag
    ON base_bag.bag_id <> superset_bag.bag_id
    WHERE NOT EXISTS (
        SELECT item_id 
        FROM bag_item base_items 
        WHERE base_items.bag_id = base_bag.bag_id 
        AND NOT EXISTS (
            SELECT 1
            FROM bag_item superset_items
            WHERE superset_items.bag_id = superset_bag.bag_id 
            AND superset_items.item_id = base_items.item_id
        )
    )
    AND (
        SELECT COUNT(DISTINCT item_id) FROM bag_item WHERE bag_id = superset_bag.bag_id
    ) > (
        SELECT COUNT(DISTINCT item_id) FROM bag_item WHERE bag_id = base_bag.bag_id
    )
    ORDER BY base_bag_id, superset_bag_id;
    
    Login or Signup to reply.
  2. This is classic Relational Division With Remainder, with the multiple divisors tweak (you want to get all bags for all other possible bags, not just for one bag).

    There are a number of solutions. A simple one, if rather inefficient, is a double NOT EXISTS.

    SELECT
      base.id AS base_bag_id,
      s.id AS superset_bag_id
    FROM bag base
    JOIN bag s
       ON s.id <> base.id
      AND NOT EXiSTS (SELECT 1
        FROM bag_item bi
        WHERE bi.bag_id = base.id
          AND NOT EXISTS (SELECT 1
            FROM bag_item si
            WHERE si.item_id = bi.item_id
              AND si.bag_id = s.id
        )
    );
    

    A more efficient solution is to join everything up using a left-join, then group and check the count for missing values

    SELECT
      base.bag_id AS base_bag_id,
      s.id AS superset_bag_id
    FROM bag_item base
    JOIN bag s ON s.id <> base.bag_id
    LEFT JOIN bag_item si
       ON si.bag_id = s.id
      AND si.item_id = base.item_id
    GROUP BY
      base.bag_id,
      s.id
    HAVING COUNT(si.item_id) = COUNT(*);
    

    db<>fiddle

    Login or Signup to reply.
  3. But, I also want to be able to filter on the basis of the base_bag_id, e.g. 1 & 4.

    Assuming that this means "find a bug which contains all items for each specified bag":

    SET @base_bags = '1,4';
    
    WITH 
    minimal_superset AS (
      SELECT DISTINCT item_id
      FROM bag_item
      WHERE FIND_IN_SET(bag_id, @base_bags)
      ),
    bags_list AS (
      SELECT DISTINCT bag_id
      FROM bag_item
      )
    SELECT @base_bags base_bags_list, bag_id superset_bag_id
    FROM minimal_superset
    CROSS JOIN bags_list
    LEFT JOIN bag_item USING (bag_id, item_id)
    GROUP BY bag_id
    HAVING NOT SUM(bag_item.item_id IS NULL)
    
    base_bags_list superset_bag_id
    1,4 2
    1,4 4

    fiddle

    If the bags in the list must be not included into the superset bags list (bag #4 in the example) then add WHERE NOT FIND_IN_SET(bag_id, @base_bags) to CTE bags_list.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search