postgresql: dynamically casting columns returned by a query

rocdaddy
July 30, 2024
169 views
0 votes
2 Answers

I have a DB in postgres with two tables, ‘responses’ and ‘variables’, like this:

responses

respondent_id	variable_name	response
01	name	susan
01	age	42
02	name	bob
02	age	13

variables

variable_name	type
name	text
age	integer

Note: The data in the ‘response’ column of the responses table is stored as text.

I’m trying to write a query that will "pivot" the data in the responses table from "long" to "wide" so that there is one row for each respondent_id, and a column for each of the variables in the ‘variable_name’ table… AND I want to convert the types of the columns returned according to the types listed in the variables table. For instance, the ‘age’ column in the pivoted result should be cast as an integer.

The problem I’m running into is trying to use a "scalar subquery" to extract the type information from the variables table which can then be fed to a CAST expression. Here is the query I’ve tried (which doesn’t work):

SELECT
  respondent_id,
  CAST(
    max(case when variable_name = 'age' then response else null end) AS
    (SELECT variables.type FROM variables WHERE variable_name = 'age')
  ) AS 'age',
  CAST(
    max(case when variable_name = 'name' then response else null end) AS
    (SELECT variables.type FROM variables WHERE variable_name = 'name')
  ) AS 'name'
FROM responses GROUP BY respondent_id

This query throws a syntax error. As best I can tell, this is because a scalar subquery cannot be used in the ‘AS’ part of CAST. I think this is true because the following query works:

SELECT
  respondent_id,
  CAST(
    max(case when variable_name = 'age' then response else null end) AS
    integer
  ) AS 'age',
  CAST(
    max(case when variable_name = 'name' then response else null end) AS
    text
  ) AS 'name'
FROM responses GROUP BY respondent_id

How can I feed information from the variables table into my query so that the columns will be cast to the correct type?

One more note: Hardcoding the types in the CAST expressions is not an option for me. What I’ve shown here is a greatly simplified example. In the actual case I’m working with, there are about 1800 rows in the variables table, and I need a parameterized query that can create the pivoted-and-correctly-cast table for any arbitrary combination of those variables.

Thanks!

Tags: postgresql

Answers

Using another language like R or Python can facilitate this. To illustrate, below is R code (using DuckDB, which is largely equivalent to PostgreSQL). You can even use it just to generate the SQL (seen as output from show_query() below).

library(tidyverse)
library(DBI)

db <- dbConnect(duckdb::duckdb())

responses <- tribble(
    ~respondent_id, ~variable_name, ~response,
    01, "name", "susan",
    01, "age",  "42",
    02, "name", "bob",
    02, "age",  "13") |>
  copy_to(db, df = _, name = "responses", overwrite = TRUE)

variables <- tribble(
  ~variable_name, ~type,
  "name", "text",
  "age", "integer") |>
  copy_to(db, df = _, name = "variables", overwrite = TRUE)

ints <- 
  variables |> 
  filter(type == "integer") |> 
  select(variable_name) |> 
  pull()

result <-
  responses |>
  pivot_wider(id_cols = respondent_id,
              names_from = "variable_name",
              values_from = "response") |>
  mutate(across(any_of(ints), as.integer))
  
result |>
  show_query()
#> <SQL>
#> SELECT respondent_id, CAST(age AS INTEGER) AS age, "name"
#> FROM (
#>   SELECT
#>     respondent_id,
#>     MAX(CASE WHEN (variable_name = 'age') THEN response END) AS age,
#>     MAX(CASE WHEN (variable_name = 'name') THEN response END) AS "name"
#>   FROM responses
#>   GROUP BY respondent_id
#> ) q01

^{Created on 2024-07-30 with reprex v2.1.1}

- Stefanovsm
- July 30, 2024 at 9:38 pm
- 0 votes
0
Having 1800+ variables would make the resultset very ‘wide’ and sparsely populated. I would rather suggest a single aggregated JSON column (hopefully much more usable in the tier above) with the variable names as attributes.
So, first create a simple helper function
```
create or replace function jsonb_dyntype(v text, t text)
returns jsonb language sql as 
$$
select 
 case t
    when 'text'     then to_jsonb(v::text)
    when 'date'     then to_jsonb(v::date)
    when 'number'   then to_jsonb(v::numeric)
    when 'integer'  then to_jsonb(v::integer)
    when 'datetime' then to_jsonb(v::timestamp)
    -- other types 
 end
$$;
```
and then
```
select jsonb_object_agg(variable_name, jsonb_dyntype(response, type)) 
from responses r join variables v using (variable_name)
group by respondent_id;
```
jsonb_object_agg

{"age": 13, "name": "bob"}

{"age": 42, "name": "susan"}

DB Fiddle demo
Login or Signup to reply.

jsonb_object_agg
{"age": 13, "name": "bob"}
{"age": 42, "name": "susan"}

Please signup or login to give your own answer.

Click here to cancel reply.