In order to conduct a Market Basket Analysis on the sales data in magento, i need to retrieve the products that are included in each invoice from magento. So far this is what i got.
SELECT
tblInvoice.increment_id AS orderId,
tblLine.product_id AS productId,
tblLine.sku as productSku,
tblLine.qty as qty
FROM
sales_flat_invoice AS tblInvoice
RIGHT JOIN
sales_flat_invoice_item AS tblLine
ON tblInvoice.entity_id = tblLine.parent_id
However, the problem is that for the products that consists of both a configuable and a simple product, I get both. I only need the simple product if it is alone.
orderId productId productSku qty
100000004 456 mpd00338 1.0000 <-- The simple product in this pair should be removed
100000004 476 mpd00338 1.0000 <-- The simple product in this pair should be removed
100000006 374 abl004 4.0000 <-- This simple product is alone, and should therefore stay
100000006 417 wbk002 1.0000 <-- Once again the simple product in this pair should be removed
100000006 284 wbk002 1.0000 <-- Once again the simple product in this pair should be removed
My initial though would be to use the visibility to remove the redundant entities, however I am not sure if this would work in all situations or if there could be cases where both are either visible or hidden?
So my questions are
-
How should I alter my sql statement in order to only select one product in case of configuable products ( without remowing those simple products that stands alone?
-
Is this solution universal for both version 1.7, 1.8 and 1.9?
-
Are there scenarios where the solution does not work?
NB. For Grouped products and Bundled products I should off course get all the simple products as they each actually represents a real product.
2
Answers
The solution I found was to join the data with the catalog_product_entity and then order the data by catalog_product_entity.type_id.. This way all the simple products was at the bottom, and when using GROUP BY i can then ensure that the only the simple products are selected when they are alone.
Here is the query you need to use to exclude those simple products whose corresponding configurable product is already included in the list
Just append the below part to your original query
So, your complete query should be this
The sales_flat_order_item database table has this parent_item_id column which says that it is a child of some parent product ( usually configurable product). All those records in the table sales_flat_order_item with parent_item_id empty are either the parent products or individual products ( simple/grouped/bundled ).
This MySQL query works across all versions and for all the scenarios.