Count GROUP_CONCAT column to check for duplicates - Mysql

SolarNewb
April 12, 2023
280 views
2 votes
3 Answers

I have the following query:

SELECT
    cpe.entity_id AS configurable_product_id,
    cpe.sku AS configurable_product_sku,
    GROUP_CONCAT(DISTINCT cpsa.attribute_id ORDER BY cpsa.attribute_id SEPARATOR ',') AS configurable_super_attribute_ids,
    cp.entity_id AS simple_product_id,
    cp.sku AS simple_product_sku,
    GROUP_CONCAT(cpei.attribute_id, '_', cpei.value ORDER BY cpei.attribute_id SEPARATOR ',') AS simple_product_super_attribute_values
FROM
    catalog_product_entity AS cpe
INNER JOIN catalog_product_super_link AS cpsl ON cpsl.parent_id = cpe.entity_id
INNER JOIN catalog_product_entity AS cp ON cp.entity_id = cpsl.product_id
INNER JOIN catalog_product_super_attribute AS cpsa ON cpsa.product_id = cpe.entity_id
INNER JOIN catalog_product_entity_int AS cpei ON cpei.attribute_id = cpsa.attribute_id AND cpei.`entity_id` = cp.`entity_id`
GROUP BY cpe.entity_id, cp.entity_id
LIMIT 100;

Which results in something like this:

configurable_product_id configurable_product_sku    configurable_super_attribute_ids    simple_product_id   simple_product_sku  simple_product_super_attribute_values
90086   Iphone  93,165  119 093 93_5730,165_7070
90086   Iphone  93,165  124 104 93_5730,165_7071
90086   Iphone  93,165  128 114 93_5707,165_7072
90086   Iphone  93,165  169 156 93_5727,165_7072
90086   Iphone  93,165  181 163 93_5730,165_7070
90086   Iphone  93,165  186 194 93_5727,165_7071
90086   Iphone  93,165  146 023 93_5730,165_7071

What I want my query to do is only show the records with duplicates in the simple_product_super_attributes_values column.

So in this case ditch the 3rd, 4th and 6th row and keep the rest:

configurable_product_id configurable_product_sku    configurable_super_attribute_ids    simple_product_id   simple_product_sku  simple_product_super_attribute_values
90086   Iphone  93,165  119 093 93_5730,165_7070
90086   Iphone  93,165  124 104 93_5730,165_7071
90086   Iphone  93,165  181 163 93_5730,165_7070
90086   Iphone  93,165  146 023 93_5730,165_7071

How to modify my query to get to my result? I tried a count on the group_concat but that is not allowed.

I tried to add a count for that column via a Having statement:

HAVING COUNT(*) > 1 
AND simple_product_super_attribute_values IN (
    SELECT simple_product_super_attribute_values
    FROM (
        SELECT
            cpe.entity_id AS configurable_product_id,
            GROUP_CONCAT(cpei.attribute_id, '_', cpei.value ORDER BY cpei.attribute_id SEPARATOR ',') AS simple_product_super_attribute_values
        FROM
            catalog_product_entity AS cp
        INNER JOIN catalog_product_super_link AS cpsl ON cpsl.product_id = cp.entity_id
        INNER JOIN catalog_product_entity AS cpe ON cpe.entity_id = cpsl.parent_id
        INNER JOIN catalog_product_super_attribute AS cpsa ON cpsa.product_id = cpe.entity_id
        INNER JOIN catalog_product_entity_int AS cpei ON cpei.attribute_id = cpsa.attribute_id AND cpei.`entity_id` = cp.`entity_id`
        GROUP BY cp.entity_id, cpe.entity_id
        HAVING COUNT(*) > 1
    ) AS subquery
    WHERE subquery.configurable_product_id = cpe.entity_id
)

But this makes no sense.

Tags: mysql mysql-8.0

Answers

HAVING is a clause like WHERE, but occurs after it and after group by. But testing count(*) there isn’t going to help.

You want to use the count window function, partitioning by simple_product_super_attribute_values. It isn’t clear to me if you want your duplicate check to apply before or after the LIMIT 100; assuming before, you would do:

select
    configurable_product_id,
    configurable_product_sku,
    configurable_super_attribute_ids,
    simple_product_id,
    simple_product_sku,
    simple_product_super_attribute_values
from (
    SELECT
        cpe.entity_id AS configurable_product_id,
        cpe.sku AS configurable_product_sku,
        GROUP_CONCAT(DISTINCT cpsa.attribute_id ORDER BY cpsa.attribute_id SEPARATOR ',') AS configurable_super_attribute_ids,
        cp.entity_id AS simple_product_id,
        cp.sku AS simple_product_sku,
        GROUP_CONCAT(cpei.attribute_id, '_', cpei.value ORDER BY cpei.attribute_id SEPARATOR ',') AS simple_product_super_attribute_values,
        count(*) over (partition by GROUP_CONCAT(cpei.attribute_id, '_', cpei.value ORDER BY cpei.attribute_id SEPARATOR ',')) > 1 duplicate_simple_product_super_attribute_values
    FROM
        catalog_product_entity AS cpe
    INNER JOIN catalog_product_super_link AS cpsl ON cpsl.parent_id = cpe.entity_id
    INNER JOIN catalog_product_entity AS cp ON cp.entity_id = cpsl.product_id
    INNER JOIN catalog_product_super_attribute AS cpsa ON cpsa.product_id = cpe.entity_id
    INNER JOIN catalog_product_entity_int AS cpei ON cpei.attribute_id = cpsa.attribute_id AND cpei.`entity_id` = cp.`entity_id`
    GROUP BY cpe.entity_id, cp.entity_id
) as products
where duplicate_simple_product_super_attributes_values
limit 100

- OJones
- April 12, 2023 at 1:38 am
- 0 votes
0
The HAVING clause to do what you need is this:
```
HAVING COUNT(DISTINCT whateverid) > 1
```
This is one of several cool things you can do in the HAVING clause.
Login or Signup to reply.

You can use your query inside a CTE then in a second CTE use the window function count() over simple_product_super_attribute_values to check for duplicates :

WITH cte as (
  SELECT
    cpe.entity_id AS configurable_product_id,
    cpe.sku AS configurable_product_sku,
    GROUP_CONCAT(DISTINCT cpsa.attribute_id ORDER BY cpsa.attribute_id SEPARATOR ',') AS configurable_super_attribute_ids,
    cp.entity_id AS simple_product_id,
    cp.sku AS simple_product_sku,
    GROUP_CONCAT(cpei.attribute_id, '_', cpei.value ORDER BY cpei.attribute_id SEPARATOR ',') AS simple_product_super_attribute_values
  FROM
    catalog_product_entity AS cpe
  INNER JOIN catalog_product_super_link AS cpsl ON cpsl.parent_id = cpe.entity_id
  INNER JOIN catalog_product_entity AS cp ON cp.entity_id = cpsl.product_id
  INNER JOIN catalog_product_super_attribute AS cpsa ON cpsa.product_id = cpe.entity_id
  INNER JOIN catalog_product_entity_int AS cpei ON cpei.attribute_id = cpsa.attribute_id AND cpei.`entity_id` = cp.`entity_id`
  GROUP BY cpe.entity_id, cp.entity_id
  LIMIT 100
),
cte2 as (
  select configurable_product_id,
         configurable_product_sku,
         configurable_super_attribute_ids,
         simple_product_super_attribute_values,
         count(1) over( partition by simple_product_super_attribute_values) as count_dups
         from mytable
)
select *
from cte2
where count_dups > 1

Please signup or login to give your own answer.

Click here to cancel reply.

Count GROUP_CONCAT column to check for duplicates – Mysql

Answers