Lets say there is a table with three columns and a million lines. How can I write a SQL statement where I get 20.000 lines, but with the restriction that a column (the second column) never has the same values?
Lets say this is the table:
Name – bpchar(17) | Bday – date | Wedding – date |
---|---|---|
Paul | 1990-01-01 | 2010-01-01 |
Susi | 1995-06-01 | 2020-01-01 |
Luis | 1995-06-01 | 2022-06-15 |
Now lets assume there are a million rows in this table.
What I want to have now is:
- up to 20.000 examples out of the million entries
- but make sure that Bday is always unique in the result
- values in the first and third column can appear multiple times in the result
3
Answers
To get 20,000 rows from a table where each row has a unique value in a specific column (let’s call it column2).
First, we give each row a unique number based on the values in column2. This ensures that each value in column2 gets a number, starting from 1.
From these numbered rows, we keep only the first row for each unique value in column2. Finally, we select 20,000 rows from the results, making sure each row has a unique value in column2.
SQL:
You could use Row_Number() Over() analytic function to enumerate rows per Bday(s) then select just the rows having Bday_rn = 1 and LIMIT number of rows returned:
… OR using another Row_Number() Over() analytic function to control number of rows returned instead of LIMIT.
See the fiddle here.
Postgres provides a
distinct on()
(see documentation) operation for specifically this type request. Something like: (see demo)NOTE: You should avoid the type
bpchar
. This type specification stands for blank padded character. Using this type a query containing the predicatewould not find a match in he above data. Typically just use
text
or if you need to limit sizevarchar(17)
.