skip to Main Content

Here I have pizza table

pizza_id toppings
1 1,2,3,4,5,6,8,10
2 4,6,7,9,11,12

I would like to know toppings with each pizza_id in common. Most used toppings in both the pizza_id…, expected answer as below table

Pizza_id    toppings
1             4,6
2             4,6

I have tried using JOINS but couldn’t satisfy the condition.
Could anyone please give me hint.
Thank you

2

Answers


  1. Your db is not respecting 1NF:each table cell must contain a single value. The best way to do so is having pizza_table and topping_table with a N-to-N relationship. In this way there is a table containing the pizza_id related with EVERY topping it have.

    Pizza table is formed as:

    pizza_id pizza_name
    1 Margherita
    2 Capricciosa

    Topping table is formed as:

    topping_id topping_name
    1 Pomodoro
    2 Mozzarella

    And N-to-N table will be:

    pizza_id topping_id
    1 1
    1 2

    In this table you can make all operation you need to get your data.

    Login or Signup to reply.
  2. As already pointed out by everyone, you should not be storing comma separated values in a single cell like that.

    But, to answer your question, assuming you are looking for the intersection of the toppings CSV for all pizzas, and you have a toppings table with (topping_id, name), you could do something like:

    SELECT
        p1.pizza_id AS p1_id,
        p2.pizza_id AS p2_id,
        GROUP_CONCAT(t.topping_id) AS toppings,
        COUNT(*) AS num
    FROM pizzas p1
    JOIN toppings t
        ON FIND_IN_SET(t.topping_id, REPLACE(p1.toppings, ', ', ','))
    JOIN pizzas p2
        ON p1.pizza_id < p2.pizza_id
        AND FIND_IN_SET(t.topping_id, REPLACE(p2.toppings, ', ', ','))
    GROUP BY p1.pizza_id, p2.pizza_id
    ORDER BY num DESC;
    

    Given these pizzas:

    pizza_id toppings
    1 1, 2, 3, 4, 5, 6, 8, 10
    2 4, 6, 7, 9, 11, 12
    3 1, 6

    The above query will return:

    p1_id p2_id toppings num
    1 2 4,6 2
    1 3 1,6 2
    2 3 6 1

    This is insanely inefficient and would be much better served by the junction table suggested by ElNicho.

    If you switch to using a junction (N-to-N) table like pizzas_toppings (pizza_id, topping_id), the query becomes:

    SELECT
        p1.pizza_id AS p1_id,
        p2.pizza_id AS p2_id,
        GROUP_CONCAT(p1.topping_id) AS toppings,
        COUNT(*) AS num
    FROM pizzas_toppings p1
    JOIN pizzas_toppings p2
        ON p1.pizza_id < p2.pizza_id
        AND p1.topping_id = p2.topping_id
    GROUP BY p1.pizza_id, p2.pizza_id
    ORDER BY num DESC;
    

    Make sure your junction table is indexed in both directions:

    CREATE TABLE `pizzas_toppings` (
        pizza_id INT UNSIGNED NOT NULL,
        topping_id INT UNSIGNED NOT NULL,
        PRIMARY KEY (pizza_id, topping_id),
        INDEX (topping_id, pizza_id),
        FOREIGN KEY (pizza_id) REFERENCES pizzas (pizza_id),
        FOREIGN KEY (topping_id) REFERENCES toppings (topping_id)
    );
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search