I have the following query:
SELECT
cpe.entity_id AS configurable_product_id,
cpe.sku AS configurable_product_sku,
GROUP_CONCAT(DISTINCT cpsa.attribute_id ORDER BY cpsa.attribute_id SEPARATOR ',') AS configurable_super_attribute_ids,
cp.entity_id AS simple_product_id,
cp.sku AS simple_product_sku,
GROUP_CONCAT(cpei.attribute_id, '_', cpei.value ORDER BY cpei.attribute_id SEPARATOR ',') AS simple_product_super_attribute_values
FROM
catalog_product_entity AS cpe
INNER JOIN catalog_product_super_link AS cpsl ON cpsl.parent_id = cpe.entity_id
INNER JOIN catalog_product_entity AS cp ON cp.entity_id = cpsl.product_id
INNER JOIN catalog_product_super_attribute AS cpsa ON cpsa.product_id = cpe.entity_id
INNER JOIN catalog_product_entity_int AS cpei ON cpei.attribute_id = cpsa.attribute_id AND cpei.`entity_id` = cp.`entity_id`
GROUP BY cpe.entity_id, cp.entity_id
LIMIT 100;
Which results in something like this:
configurable_product_id configurable_product_sku configurable_super_attribute_ids simple_product_id simple_product_sku simple_product_super_attribute_values
90086 Iphone 93,165 119 093 93_5730,165_7070
90086 Iphone 93,165 124 104 93_5730,165_7071
90086 Iphone 93,165 128 114 93_5707,165_7072
90086 Iphone 93,165 169 156 93_5727,165_7072
90086 Iphone 93,165 181 163 93_5730,165_7070
90086 Iphone 93,165 186 194 93_5727,165_7071
90086 Iphone 93,165 146 023 93_5730,165_7071
What I want my query to do is only show the records with duplicates in the simple_product_super_attributes_values column.
So in this case ditch the 3rd, 4th and 6th row and keep the rest:
configurable_product_id configurable_product_sku configurable_super_attribute_ids simple_product_id simple_product_sku simple_product_super_attribute_values
90086 Iphone 93,165 119 093 93_5730,165_7070
90086 Iphone 93,165 124 104 93_5730,165_7071
90086 Iphone 93,165 181 163 93_5730,165_7070
90086 Iphone 93,165 146 023 93_5730,165_7071
How to modify my query to get to my result? I tried a count on the group_concat but that is not allowed.
I tried to add a count for that column via a Having statement:
HAVING COUNT(*) > 1
AND simple_product_super_attribute_values IN (
SELECT simple_product_super_attribute_values
FROM (
SELECT
cpe.entity_id AS configurable_product_id,
GROUP_CONCAT(cpei.attribute_id, '_', cpei.value ORDER BY cpei.attribute_id SEPARATOR ',') AS simple_product_super_attribute_values
FROM
catalog_product_entity AS cp
INNER JOIN catalog_product_super_link AS cpsl ON cpsl.product_id = cp.entity_id
INNER JOIN catalog_product_entity AS cpe ON cpe.entity_id = cpsl.parent_id
INNER JOIN catalog_product_super_attribute AS cpsa ON cpsa.product_id = cpe.entity_id
INNER JOIN catalog_product_entity_int AS cpei ON cpei.attribute_id = cpsa.attribute_id AND cpei.`entity_id` = cp.`entity_id`
GROUP BY cp.entity_id, cpe.entity_id
HAVING COUNT(*) > 1
) AS subquery
WHERE subquery.configurable_product_id = cpe.entity_id
)
But this makes no sense.
3
Answers
HAVING is a clause like WHERE, but occurs after it and after group by. But testing count(*) there isn’t going to help.
You want to use the count window function, partitioning by simple_product_super_attribute_values. It isn’t clear to me if you want your duplicate check to apply before or after the LIMIT 100; assuming before, you would do:
The HAVING clause to do what you need is this:
This is one of several cool things you can do in the HAVING clause.
You can use your query inside a
CTE
then in a second CTE use the window functioncount()
oversimple_product_super_attribute_values
to check for duplicates :