skip to Main Content

In a PostgreSQL have the table Sales containing the following data:

| id(PK) | eva_dts_id(FK) | realm_name | device_id | timestamp | product_id | product_name | quantity | product_price | revenue |

realm_name and device_id are constants and even if they changed in the future they would be updated accordingly on this table.

I’m trying to create the view Products on it with the following schema:

| realm_name | device_id | product_id | product_name | product_price |

So far I used the following query:

SELECT DISTINCT realm_name, device_id, product_id, product_name, product_price
FROM public.Sales;

This works assuming the product list of a device is immutable and so won’t change in the future. I’d like to release that assumption and create a product list data for a device considering always the most recent record (based on the timestamp found on the Sales table). So for instance if I have the product "Chocomilk" that had the records:

| id(PK) | eva_dts_id(FK) | realm_name | device_id | timestamp | product_id | product_name | quantity | product_price | revenue |
"03ef91f6-bb24-4c8e-90ef-366cc4dee5a6"  "e853dcec-c369-4111-816d-1645067df8e1"  "tenant"    "RbbMIyemWTOI99N6XZx1hA"    "2023-03-26 22:43:31.454734"    "10"    "Chocomilk" 1   0.38    0
"03ef91f6-bb24-4c8e-90ef-366cc4dee5a6"  "e853dcec-c369-4111-816d-1645067df8e1"  "tenant"    "RbbMIyemWTOI99N6XZx1hA"    "2023-04-12 22:43:31.454734"    "10"    "Chocomilk" 1   2.3 0
"03ef91f6-bb24-4c8e-90ef-366cc4dee5a6"  "e853dcec-c369-4111-816d-1645067df8e1"  "tenant"    "RbbMIyemWTOI99N6XZx1hA"    "2023-05-18 22:43:31.454734"    "10"    "Chocomilk" 1   1.5 0

I would consider the last record only.

How can I rewrite the view query in order to achieve this?

5

Answers


  1. Chosen as BEST ANSWER

    Proposing myself an alternative solution to @markalex 's answer and following the advice of @tinazmu :

    SELECT DISTINCT realm_name,
                    device_id,
                    product_id,
                    product_name,
                    product_price
    FROM public.Sales AS ext
    WHERE timestamp =
        (
          SELECT MAX(int.timestamp)
          FROM public.Sales AS int
          WHERE int.realm_name = ext.realm_name
           AND int.device_id = ext.device_id
           AND int.product_id = ext.product_id
        );
    

  2. Use ROW_NUMBER() to order your items information by timestamp and get only newest. Something along the lines of

    select
        realm_name,
        device_id,
        product_id,
        product_name,
        product_price
    from
        (
            SELECT
                realm_name,
                device_id,
                product_id,
                product_name,
                product_price,
                row_number() OVER (
                    PARTITION BY
                        realm_name,
                        device_id,
                        product_id
                    ORDER BY
                        timestamp desc
                ) rn
            FROM
                public.Sales
        ) as most_recents
    where
        rn = 1;
    

    Check carefully PARTITION BY clause of ROW_NUMBER() to contain unique identifiers of item, that don’t change over time.

    Login or Signup to reply.
  3. I prefer a common table expression for queries like this, it makes it easier to identify what the data is and how you are filtering it.

    Row_number and partition, give the query engine a better idea of what you are actually after. Sql is a declarative language, whenever possible you should describe what you want, not do tricks to lead it to the right answer.

    ;with cte as (
    select  realm_name
          , device_id
          , product_id
          , product_name
          , product_price
          , row_number() OVER (PARTITION BY realm_name, device_id, product_id
                               ORDER BY timestamp desc
                              ) rw
    from public.Sales
    )
    select realm_name
          , device_id
          , product_id
          , product_name
          , product_price
    from cte 
    where rw=1
    
    Login or Signup to reply.
  4. Postgres offers another option: Distinct on clause of select statement. This clause results in keeping on the first row matching the distinct on expression (according to the order by phrase). What you are looking for is: (see demo)

    select distinct on(product_name) *
      from sales 
     order by product_name, sale_ts desc; 
    

    Note: IMHO naming a column timestamp is poor practice as it is a SQL reserved word and a Postgres data type (although not Postgres reserved). I have substituted sale_ts in above and demo.

    Login or Signup to reply.
  5. Another option (might be more intuitive) is to filter out older records using EXISTS function:

    SELECT  T.realm_name,
            T.device_id,
            T.product_id,
            T.product_name,
            T.product_price
    FROM    Sales       AS  T
    WHERE   NOT EXISTS
            (
                SELECT  1
                FROM    Sales               AS  newer
                WHERE   newer.product_id    =   T.product_id
                AND     newer.[timestamp]   >   T.[timestamp]
            )
    

    This query will make sure there isn’t any newer record for the same product_id

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search