using SQL to detect sex change - Postgresql

ssss3333
January 20, 2023
333 views
2 votes
3 Answers

I have a dataset like this:

id : E.g. 111, 111, 111, 112, 112, 113, 113
Year: E.g. 2010, 2011, 2012, 2010, 2011,2010, 2015
Sex: E.g. M, M, F, F, F, M, M

In this dataset, ID = 111 had a sex change (switch from M to F – or from F to M)

With postgre sql, I try to find out:

A: How many ids stay as man (and which ids)
B: How many ids stay as woman (and which ids)
C: How many ids go from man to woman (and which ids)
D: How many ids go from woman to man (and which ids)

I try like this:

# problem A
SELECT COUNT(DISTINCT ID) FROM table WHERE ID NOT IN (SELECT ID FROM table WHERE SEX = 'M');
SELECT DISTINCT ID FROM table WHERE ID NOT IN (SELECT ID FROM table WHERE SEX = 'M');

# problem B
SELECT COUNT(DISTINCT ID) FROM table WHERE ID NOT IN (SELECT ID FROM table WHERE SEX = 'F');
SELECT DISTINCT ID FROM table WHERE ID NOT IN (SELECT ID FROM table WHERE SEX = 'F');

# all sex change
SELECT COUNT(DISTINCT ID) FROM table WHERE ID IN (SELECT ID FROM table WHERE SEX = 'M') AND ID IN (SELECT ID FROM table WHERE SEX = 'F');
SELECT DISTINCT ID FROM table WHERE ID IN (SELECT ID FROM table WHERE SEX = 'M') AND ID IN (SELECT ID FROM table WHERE SEX = 'F');

Is it correct? Or is window-lag function needed?

Tags: postgresql sql

Answers

You can try this, to calculate in advance some metrics:

 SELECT *
        ,MAX(CASE WHEN sex = 'M' THEN 1 ELSE 0 END) OVER (PARTITION BY ID) AS has_M
        ,MAX(CASE WHEN sex = 'F' THEN 1 ELSE 0 END) OVER (PARTITION BY ID) AS has_F
        ,DENSE_RANK() OVER (PARTITION BY id ORDER BY id, year) AS initial_sex
  FROM mytable;

and then solve your issues:

SELECT SUM(CASE WHEN initial_sex = 1 AND SEX = 'M' THEN 1 ELSE 0 END) 
      ,string_agg(CASE WHEN initial_sex = 1 AND SEX = 'M' THEN CAST(id AS VARCHAR(12)) END, ', ') 
      ,SUM(CASE WHEN initial_sex = 1 AND SEX = 'F' THEN 1 ELSE 0 END) 
      ,string_agg(CASE WHEN initial_sex = 1 AND SEX = 'F' THEN CAST(id AS VARCHAR(12)) END, ', ') 
      ,SUM(CASE WHEN (initial_sex = 1 AND SEX = 'F' AND has_m = 1) OR (initial_sex = 1 AND SEX = 'M' AND has_F = 1)  THEN 1 ELSE 0 END) 
      ,string_agg(CASE WHEN (initial_sex = 1 AND SEX = 'F' AND has_m = 1) OR (initial_sex = 1 AND SEX = 'M' AND has_F = 1)  THEN CAST(id AS VARCHAR(12)) END, ', ') 
FROM
(
  SELECT *
        ,MAX(CASE WHEN sex = 'M' THEN 1 ELSE 0 END) OVER (PARTITION BY ID) AS has_M
        ,MAX(CASE WHEN sex = 'F' THEN 1 ELSE 0 END) OVER (PARTITION BY ID) AS has_F
        ,DENSE_RANK() OVER (PARTITION BY id ORDER BY id, year) AS initial_sex
  FROM mytable
) DS;

Here is the full working example.

- PrashantThorat
- January 20, 2023 at 7:43 am
- 0 votes
0
Assuming column SEX will only have either ‘F’ or ‘M’ as value, problem A can be solved

problem A
```
SELECT COUNT(DISTINCT ID) FROM table WHERE  SEX != 'F';
SELECT DISTINCT ID FROM table WHERE  SEX != 'F';
```
Login or Signup to reply.

- SMan
- January 20, 2023 at 1:38 pm
- 0 votes
0
step-by-step demo: db<>fiddle

Assuming, the change happens only once, you could use the first_value() window function:
```
SELECT DISTINCT                                                                   -- 5
    id,
    CASE
        WHEN first_sex = last_sex THEN 'Stay ' || sex                             -- 3
        ELSE 'Change from ' || first_sex || ' To ' || last_sex                    -- 4    
    END sex_status
FROM (
    SELECT 
        id,
        sex,
        first_value(sex) OVER (PARTITION BY id ORDER BY year) as first_sex,       -- 1
        first_value(sex) OVER (PARTITION BY id ORDER BY year DESC) as last_sex    -- 2
    FROM mytable
) s
```
1. Fetch first sex values per id over years
2. Fetch last sex values per id over years (Notice the different order: It gives the first value from the "bottom")
3. Compare first and last; if they are the same, return "Stay" and sex
4. Otherwise return "Change" with sexes. (Of course, you can do whatever you want here. Adding appropriate status identifiers or similar instead of pure text seems to make sense at this point.)
5. DISTINCT clause to reduce the records to one per id.
Afterwards you can do whatever statistics you want. For example counting the different status by GROUP BY sex_status:

demo: db<>fiddle
```
SELECT 
    sex_status,
    COUNT(*)
FROM (

-- query from above

) s
GROUP BY sex_status
```
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.

using SQL to detect sex change – Postgresql

Answers