This might be difficult to explain. But Im trying to write a redshift sql query where I have want the count of organizations that fall into different market buckets. There are 50 markets. For example company x can be only be found in 1 market and company y can be found in 3 markets. I want to preface that I have over 10,000 companies to fit into these buckets. So ideally it would be more like, hypothetically 500 companies are found in 3 markets or 7 companies are found in 50 markets.
The table would like
Market Bucket | Org Count |
---|---|
1 Markets | 3 |
2 Markets | 1 |
3 Markets | 0 |
select count(distinct case when enterprise_account = true and (market_name then organization_id end) as "1 Market" from organization_facts
I was trying to formulate the query from above but I got confused on how to effectively formulate the query
Organization Facts
Market Name | Org ID | Org Name |
---|---|---|
New York | 15683 | Company x |
Orlando | 38478 | Company y |
Twin Cities | 2738 | Company z |
Twin Cities | 15683 | Company x |
Detroit | 99 | Company xy |
2
Answers
You would need a sub-query that retrieves the number of markets per company, and an outer query that summarises into a count of markets.
Something like:
If I follow you correctly, you can do this with two levels of aggregation. Assuming that
org_id
represents a company in your dataset:The subquery counts the number of markets per company. I assumed no duplicate
(ord_id, market_name)
tuples in the table ; if that’s not the case, then you needcount(distinct market_name)
instead ofcount(*)
in that spot.Then, the outer query just counts how many times each market count occurs in the subquery, which yields the result that you want.
Note that I left apart the
enterprise_account
column ,that appears in your query but not in your data.