I have a dataset like this:
- id : E.g. 111, 111, 111, 112, 112, 113, 113
- Year: E.g. 2010, 2011, 2012, 2010, 2011,2010, 2015
- Sex: E.g. M, M, F, F, F, M, M
In this dataset, ID = 111 had a sex change (switch from M to F – or from F to M)
With postgre sql, I try to find out:
- A: How many ids stay as man (and which ids)
- B: How many ids stay as woman (and which ids)
- C: How many ids go from man to woman (and which ids)
- D: How many ids go from woman to man (and which ids)
I try like this:
# problem A
SELECT COUNT(DISTINCT ID) FROM table WHERE ID NOT IN (SELECT ID FROM table WHERE SEX = 'M');
SELECT DISTINCT ID FROM table WHERE ID NOT IN (SELECT ID FROM table WHERE SEX = 'M');
# problem B
SELECT COUNT(DISTINCT ID) FROM table WHERE ID NOT IN (SELECT ID FROM table WHERE SEX = 'F');
SELECT DISTINCT ID FROM table WHERE ID NOT IN (SELECT ID FROM table WHERE SEX = 'F');
# all sex change
SELECT COUNT(DISTINCT ID) FROM table WHERE ID IN (SELECT ID FROM table WHERE SEX = 'M') AND ID IN (SELECT ID FROM table WHERE SEX = 'F');
SELECT DISTINCT ID FROM table WHERE ID IN (SELECT ID FROM table WHERE SEX = 'M') AND ID IN (SELECT ID FROM table WHERE SEX = 'F');
Is it correct? Or is window-lag function needed?
3
Answers
You can try this, to calculate in advance some metrics:
and then solve your issues:
Here is the full working example.
Assuming column SEX will only have either ‘F’ or ‘M’ as value, problem A can be solved
problem A
step-by-step demo: db<>fiddle
Assuming, the change happens only once, you could use the
first_value()
window function:sex
values perid
overyear
ssex
values perid
overyear
s (Notice the different order: It gives the first value from the "bottom")DISTINCT
clause to reduce the records to one perid
.Afterwards you can do whatever statistics you want. For example counting the different status by
GROUP BY sex_status
:demo: db<>fiddle