skip to Main Content

I am trying to write a SQL query to get the month end prices for a particular instrument.

Table (contains some other columns that can be ignored):

instrumentId unitPrice reportedAt
1 90.22 2018-10-28
1 80.55 2018-10-30
1 70.55 2018-09-15
1 70.56 2018-09-23

Expected result:

instrumentId unitPrice reportedAt
1 80.55 2018-10-30
1 70.56 2018-09-23

There will be multiple rows for the same instrumentId with different reportedAt from different months. The idea is for each month and year (e.g. Oct 2018), I am trying to get only the last price reportedAt for each month.

SELECT 
    "instrumentId",
    "unitPrice",
    MAX("reportedAt") as reportedAt
FROM 
    "InstrumentPrice"
WHERE 
    "instrumentId" = 90
GROUP BY 
    EXTRACT(MONTH FROM "reportedAt"),
    EXTRACT(YEAR FROM "reportedAt")
ORDER BY 
    "reportedAt" DESC

However, this query keeps throwing an error saying the columns selected must appear in GROUP BY clause or be used in an aggregate function.

This will not work because if I group by unitPrice or reportedAt, I will not get the desired results (because they are all unique).

2

Answers


  1. The error in your query is caused by including the column unitPrice and instrumentId in the SELECT clause without including it in an aggregate function or in the GROUP BY clause.

    In SQL, when you use GROUP BY, you should either:

    • Include the column in the GROUP BY clause.
    • Use an aggregate function (like MAX, MIN, SUM, etc.) on the column in the SELECT clause.

    You can achieve this using window functions. You can use the ROW_NUMBER() window function to assign a row number to each row within each month and year partition based on the reportedAt column, and then filter for rows where the row number is 1 (indicating the last price reported for each month). Here’s how you can do it:

    SELECT
        "instrumentId",
        "unitPrice",
        "reportedAt"
    FROM (
        SELECT
            "instrumentId",
            "unitPrice",
            "reportedAt",
            ROW_NUMBER() OVER (PARTITION BY "instrumentId", EXTRACT(MONTH FROM "reportedAt"), EXTRACT(YEAR FROM "reportedAt") ORDER BY "reportedAt" DESC) AS row_num
        FROM
            "InstrumentPrice"
        WHERE
            "instrumentId" = 90
    ) subquery
    WHERE
        row_num = 1
    ORDER BY
        "reportedAt" DESC;
    

    In this query:

    1. The inner subquery assigns a row number to each row within each month and year partition using the ROW_NUMBER() window function. The PARTITION BY clause partitions the data by instrumentId, month, and year, and orders the rows within each partition by reportedAt in descending order.
    2. The outer query then selects rows where the row number is 1, which corresponds to the last price reported for each month and year combination.
    Login or Signup to reply.
  2. group by is not the right tool for your job. Here it is using window function row_number().

    with t as
    (
     select 
       instrumentId, unitPrice, reportedAt,
       row_number() over (
                          partition by date_trunc('month', reportedAt)
                          order by reportedAt desc
                         ) as rn
     from the_table
    )
    select instrumentId, unitPrice, reportedAt
    from t where rn = 1;
    

    Demo

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search