skip to Main Content

I want to group all entries in my table by their user_type property.
However the user_type entries exist in a {type of user} {date account was created} format, e.g.

id | user_type
--------------
01 | 'admin 2024-03-01'
02 | 'admin 2024-03-01'
03 | 'user 2024-03-02'
04 | 'user 2024-03-02'
05 | 'premium_user 2024-03-03'

I want to get the counts of each type of user, ignoring the date at the end of the value. So for the above example, my query would return:

user_type      | count
----------------------
'admin'        | 2
'user'         | 2
'premium_user' | 1

The user types are subject to change, so I can’t hardcode in filters such as select count(id) where user_type like 'admin %'.

2

Answers


  1. We can use SUBSTRING() with a regular expression here:

    SELECT SUBSTRING(user_type from '^w+') AS user_type, COUNT(*) AS count
    FROM yourTable
    GROUP BY user_type
    ORDER BY COUNT(*) DESC;
    

    Here is a working SQL fiddle.

    Login or Signup to reply.
  2. Obviously, you need to split that column into two columns: user_type text (or some enumeration type) and date_created date.

    While stuck with your unfortunate design – and if the trailing date has the fixed format your sample data suggests – left() with a negative length to truncate the last 11 characters is simplest & cheapest (much cheaper than a regular expression):

    SELECT left(user_type, -11) AS user_type, count(*) AS count
    FROM   tbl
    GROUP  BY 1
    ORDER  BY 2 DESC;
    

    The solution is for Postgres. You also tagged Redshift, which is not Postgres at all. There, left() does not support negative length. You might work around with:

    substring(user_type, 1, length(user_type) - 11)
    

    About the numeric reference in GROUP BY 1 and ORDER BY 2:

    That works in Postgres and Redshift.

    fiddle

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search