Count the number of occurences in each bucket Redshift SQL

YigaNgbogbara
November 9, 2022
153 views
1 vote
2 Answers

This might be difficult to explain. But Im trying to write a redshift sql query where I have want the count of organizations that fall into different market buckets. There are 50 markets. For example company x can be only be found in 1 market and company y can be found in 3 markets. I want to preface that I have over 10,000 companies to fit into these buckets. So ideally it would be more like, hypothetically 500 companies are found in 3 markets or 7 companies are found in 50 markets.

The table would like

Market Bucket	Org Count
1 Markets	3
2 Markets	1
3 Markets	0

select count(distinct case when enterprise_account = true and (market_name then organization_id end) as "1 Market" from organization_facts

I was trying to formulate the query from above but I got confused on how to effectively formulate the query

Organization Facts

Market Name	Org ID	Org Name
New York	15683	Company x
Orlando	38478	Company y
Twin Cities	2738	Company z
Twin Cities	15683	Company x
Detroit	99	Company xy

Answers

- JohnRotenstein
- November 9, 2022 at 9:30 pm
- 0 votes
0
You would need a sub-query that retrieves the number of markets per company, and an outer query that summarises into a count of markets.

Something like:
```
with markets as (
  select
    org_name,
    count(distinct market_name) as market_count
  from organization_facts
)

select
  market_count,
  count(*) as org_count
from markets
group by market_count
order by market_count
```
Login or Signup to reply.

- GMB
- November 9, 2022 at 10:51 pm
- 0 votes
0
If I follow you correctly, you can do this with two levels of aggregation. Assuming that org_id represents a company in your dataset:
```
select cnt_markets, count(*) cnt_org_id
from (select count(*) cnt_markets from organization_facts group by org_id) t
group by cnt_markets
```
The subquery counts the number of markets per company. I assumed no duplicate (ord_id, market_name) tuples in the table ; if that’s not the case, then you need count(distinct market_name) instead of count(*) in that spot.

Then, the outer query just counts how many times each market count occurs in the subquery, which yields the result that you want.

Note that I left apart the enterprise_account column ,that appears in your query but not in your data.
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.