skip to Main Content

I have a table with orders (for each order (transaction_id) there are client_id and item_id and quantity)
I need to find completely identical orders for a set of goods and their quantity in the order

CREATE TABLE IF NOT EXISTS orders (client_id varchar(10),
            item_id varchar(10), quantity int, transaction_id varchar(10));
INSERT INTO orders values 
        ('CL1111','111',1, '1001'),
        ('CL1111','222',2,'1001'),
        ('CL1111','333',1,'1001'),
        ('CL2222','111',2,'1002'),
        ('CL2222','222',1,'1002'),
        ('CL2222','333',1,'1002'),
        ('CL3333','111',1,'1003'),
        ('CL3333','222',2,'1003'),
        ('CL3333','333',1,'1003'),
        ('CL3333','444',1,'1003'),
        ('CL4444','111',1,'1004'),
        ('CL4444','222',2,'1004'),
        ('CL4444','333',1,'1004'),
        ('CL5555','111',1,'1005'),
        ('CL5555','222',2,'1005'),
        ('CL6666','111',1,'1006'),
        ('CL6666','222',2,'1006'),
        ('CL6666','333',1,'1007')
client_id item_id quantity transaction_id
CL1111 111 1 1001
CL1111 222 2 1001
CL1111 333 1 1001
CL2222 111 2 1002
CL2222 222 1 1002
CL2222 333 1 1002
CL3333 111 1 1003
CL3333 222 2 1003
CL3333 333 1 1003
CL3333 444 1 1003
CL4444 111 1 1004
CL4444 222 2 1004
CL4444 333 1 1004
CL5555 111 1 1005
CL5555 222 2 1005
CL6666 111 1 1006
CL6666 222 2 1006
CL6666 333 1 1007

Identical orders here are: (1001 and 1004), (1005 and 1006)
And now I don’t know, how to leave orders, which completely identical on item_id, quantity and set of items.

4

Answers


  1. Chosen as BEST ANSWER
    select
         transaction_id
    from
         (
         select
            transaction_id,
            ai,
            aq,
            count(1) over (partition by ai, aq) cnt
         from
              (
              select
                   transaction_id,
                   array_agg(array[item_id]) ai,
                   array_agg(array[quantity]) aq
              from orders
              group by transaction_id
              ) a
         ) a
    where cnt > 1
    

  2. Your example code doesn’t include quantity!
    Here is a code that results as many rows as the table contains the same orders:

    sql = """
        with orders_mod as
        (
            select distinct
                    "transaction_id",
                    string_agg("item_id" || ' ' || "quantity"::text, ',') over (partition by "client_id") as "list_of_ordered_item_ids"
            from    orders
        )
        select distinct
                string_agg("transaction_id", ',') over (partition by "list_of_ordered_item_ids") as "list_of_the_same_orders",
                "list_of_ordered_item_ids"
        from    orders_mod
        ;
    """
    
    df_SQL = pd.read_sql(sql, con)
    df_SQL
    
    Login or Signup to reply.
  3. Please check this query, it returns correct rows in dbfiddle

    with agg as (
      SELECT client_id, 
             array_agg(array[item_id]) ai, array_agg(array[quantity]) aq
      from orders group by client_id 
      having count(distinct transaction_id) = 1 )
    select client_id, ai items, aq quantities 
    from (
      select client_id, ai , aq , count(1) over (partition by ai, aq) cnt
      from agg ) c
    where cnt > 1
    
    client_id items quantities
    CL1111 {{111},{222},{333}} {{1},{2},{1}}
    CL4444 {{111},{222},{333}} {{1},{2},{1}}

    I am not sure what about clients having several transactions, but in your examples clients CL5555 and CL6666 are not matched even though they have same transactions 1005 and 1006. So here only clients having one distinct transaction are compared.

    Login or Signup to reply.
  4. Try this query. Order clause in aggregation is necessary for reliable result.

    with transactions as(
    select client_id,transaction_id
       ,string_agg( (item_id ||'('||cast(quantity as varchar) || ')') ,',' order by item_id) itemlist
    from orders
    group by client_id,transaction_id
    )
    select * 
      ,row_number()over(partition by itemlist order by client_id,transaction_id) rn
      ,count(*)over(partition by itemlist) cnt
    from transactions
    order by itemlist
    

    Rows with cnt>1 is has identical row (identical items with same quantity).

    With test data, find 2 identical groups for transaction_id 1001-1004 and 1005-1006 (not green in your picture)

    result is

    client_id transaction_id itemlist rn cnt
    CL5555 1005 111(1),222(2) 1 2
    CL6666 1006 111(1),222(2) 2 2
    CL1111 1001 111(1),222(2),333(1) 1 2
    CL4444 1004 111(1),222(2),333(1) 2 2
    CL3333 1003 111(1),222(2),333(1),444(1) 1 1
    CL2222 1002 111(2),222(1),333(1) 1 1
    CL6666 1007 333(1) 1 1

    Test data

    INSERT INTO orders (client_id,item_id,transaction_id,quantity)
    values 
            ('CL1111','111','1001',1),
            ('CL1111','222','1001',2),
            ('CL1111','333','1001',1),
            ('CL2222','111','1002',2),
            ('CL2222','222','1002',1),
            ('CL2222','333','1002',1),
            ('CL3333','111','1003',1),
            ('CL3333','222','1003',2),
            ('CL3333','333','1003',1),
            ('CL3333','444','1003',1),
            ('CL4444','111','1004',1),
            ('CL4444','222','1004',2),
            ('CL4444','333','1004',1),
            ('CL5555','111','1005',1),
            ('CL5555','222','1005',2),
            ('CL6666','111','1006',1),
            ('CL6666','222','1006',2),
            ('CL6666','333','1007',1)
    ;
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search