I have a DB in postgres with two tables, ‘responses’ and ‘variables’, like this:
responses
respondent_id | variable_name | response |
---|---|---|
01 | name | susan |
01 | age | 42 |
02 | name | bob |
02 | age | 13 |
variables
variable_name | type |
---|---|
name | text |
age | integer |
Note: The data in the ‘response’ column of the responses table is stored as text.
I’m trying to write a query that will "pivot" the data in the responses table from "long" to "wide" so that there is one row for each respondent_id, and a column for each of the variables in the ‘variable_name’ table… AND I want to convert the types of the columns returned according to the types listed in the variables table. For instance, the ‘age’ column in the pivoted result should be cast as an integer.
The problem I’m running into is trying to use a "scalar subquery" to extract the type information from the variables table which can then be fed to a CAST
expression. Here is the query I’ve tried (which doesn’t work):
SELECT
respondent_id,
CAST(
max(case when variable_name = 'age' then response else null end) AS
(SELECT variables.type FROM variables WHERE variable_name = 'age')
) AS 'age',
CAST(
max(case when variable_name = 'name' then response else null end) AS
(SELECT variables.type FROM variables WHERE variable_name = 'name')
) AS 'name'
FROM responses GROUP BY respondent_id
This query throws a syntax error. As best I can tell, this is because a scalar subquery cannot be used in the ‘AS’ part of CAST
. I think this is true because the following query works:
SELECT
respondent_id,
CAST(
max(case when variable_name = 'age' then response else null end) AS
integer
) AS 'age',
CAST(
max(case when variable_name = 'name' then response else null end) AS
text
) AS 'name'
FROM responses GROUP BY respondent_id
How can I feed information from the variables table into my query so that the columns will be cast to the correct type?
One more note: Hardcoding the types in the CAST expressions is not an option for me. What I’ve shown here is a greatly simplified example. In the actual case I’m working with, there are about 1800 rows in the variables table, and I need a parameterized query that can create the pivoted-and-correctly-cast table for any arbitrary combination of those variables.
Thanks!
2
Answers
Using another language like R or Python can facilitate this. To illustrate, below is R code (using DuckDB, which is largely equivalent to PostgreSQL). You can even use it just to generate the SQL (seen as output from
show_query()
below).Created on 2024-07-30 with reprex v2.1.1
Having 1800+ variables would make the resultset very ‘wide’ and sparsely populated. I would rather suggest a single aggregated JSON column (hopefully much more usable in the tier above) with the variable names as attributes.
So, first create a simple helper function
and then
DB Fiddle demo