skip to Main Content

I need an SQL query for the next question.
For example I have a table:

| id       | element  | dt       |
| -------- | -------- | -------- |
| 1        | a        | 22/04/22 |
| 2        | a        | 22/04/22 |
| 1        | b        | 27/04/22 |
| 1        | a        | 23/04/22 |
| 3        | b        | 22/04/22 |
| 1        | a        | 22/04/22 |
| 1        | a        | 22/04/22 |
| 3        | b        | 23/04/22 |
| 3        | b        | 25/04/22 |
| 1        | a        | 27/04/22 |
| 1        | c        | 26/04/22 |
| 1        | d        | 26/04/22 |
| 1        | g        | 25/04/22 |
| 1        | b        | 27/04/22 |

I want to find the percentage of similar items that the user (id) sees in one week (but not counting on the same day). For example, for this table, the answer is:

| id       | percentage |
| -------- | ---------  |
| 1        | 0.3        |
| 2        | 0.0        |
| 3        | 1          |

for id 1 answer is 0.3, because if we are taking a window for one week (7 days) he has seen an element "a" 3 times at 22/04/22, 23/04/22 and 27/04/22. He seen "a" at 22/04/22 a few times, but we don’t counting it. At he seen 10 times at total. So answer is 3/10 = 0.3

I tries smth like this, but it doesn’t solve my problem exactly.

SELECT
  dt,
  id,
  COUNT(DISTINCT element) AS total,
  (
    SELECT COUNT(DISTINCT element)
    FROM table newt
    WHERE newt.dt = dt
  ) AS same_day_elements,
  COUNT(DISTINCT element) / (SELECT COUNT(DISTINCT element) FROM table tempt WHERE date = tempt.date) * 100 AS percentage
FROM
  table t
GROUP BY
  dt,
  id

2

Answers


  1. You should think in terms of first collapsing your data set and then performing your calculations on that reduced data set. The problem becomes a whole lot easier. So, first count all the id-element combinations, then embed it in a calculation:

    Select id,sum(Case When occurrences>1 Then 1 Else 0 End) *100/count(*) as percentage 
    From (
        Select id,element,count(*) as occurrences
        From table
        Where dt between '2022-04-22' and '2022-04-28'
        Group By id,element
    )
    Group By id
    
    Login or Signup to reply.
  2. This is based on Chris’s answer, with the subquery containing counts for distinct date and all rows for each id, element group:

    SELECT id, ROUND(SUM(IF(dd > 1, dd, 0)) * 100 / SUM(total), 0) AS percentage
    FROM (
        SELECT id, element, COUNT(DISTINCT dt) AS dd, COUNT(*) AS total
        FROM t
        WHERE dt BETWEEN '2022-04-22' AND '2022-04-28'
        GROUP BY id, element
    ) t
    GROUP BY id;
    

    Output:

    id percentage
    1 30
    2 0
    3 100

    In the comments above you extended your question with:

    what if I have data for a year and I need to look not only at one week?

    It is not completely clear what you want for this, but assuming you want to group by week, you could add week number (WEEK(date[,mode])) to both the inner and outer groupings:

    SELECT id, weekno, ROUND(SUM(IF(dd > 1, dd, 0)) * 100 / SUM(total), 0) AS percentage
    FROM (
        SELECT id, element, WEEK(dt, 1) AS weekno, COUNT(DISTINCT dt) AS dd, COUNT(*) AS total
        FROM t
        WHERE dt BETWEEN '2022-01-03' AND '2023-01-01'
        GROUP BY id, element, weekno
    ) t
    GROUP BY id, weekno;
    

    Here’s a db<>fiddle

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search