I have a set of (dance) performances (https://dbfiddle.uk/jMl5dgRK), where each performance has a related piece, a venue, a season and a date.
Most pieces have been performed several times during a season. In some cases, several performances took place at the same day and at the same venue (e.g. at theatre festivals or as double/triple/… bill).
I’d like to have one group (a performance “series”) for all pieces that have been performed together during one season at one specific venue, something like this:
Venue ID | Season ID | Piece IDs | Dates |
---|---|---|---|
1 | 1 | 1, 2, 3 | 1980-01-01, 1980-05-01 |
1 | 1 | 2, 3, 4 | 1980-02-01, 1980-02-02 |
Tables and some data for testing:
CREATE TABLE pieces (
id SERIAL PRIMARY KEY,
title VARCHAR(100) NOT NULL
);
CREATE TABLE venues (
id SERIAL PRIMARY KEY,
title VARCHAR(100) NOT NULL
);
CREATE TABLE seasons (
id SERIAL PRIMARY KEY,
title varchar(100) NOT NULL
);
CREATE TABLE performances (
id SERIAL PRIMARY KEY,
title VARCHAR(100) NOT NULL,
date DATE NOT NULL,
piece_id INT NOT NULL,
venue_id INT NOT NULL,
season_id INT NOT NULL,
CONSTRAINT fk_piece FOREIGN KEY(piece_id) REFERENCES pieces(id),
CONSTRAINT fk_season FOREIGN KEY(season_id) REFERENCES seasons(id),
CONSTRAINT fk_venue FOREIGN KEY(venue_id) REFERENCES venues(id)
);
INSERT INTO pieces (title) VALUES ('Alice’s Piece'), ('Bob’s Piece'), ('Charlie’s Piece');
INSERT INTO seasons (title) VALUES ('Season 1980/81'), ('Season 1981/82'), ('Season 1982/83');
INSERT INTO venues (title) VALUES ('Old theater New York'), ('Globe Theatre London'), ('Edinburgh Congress Hall');
INSERT INTO performances (title, date, piece_id, venue_id, season_id) VALUES
('Performance 1', '1981-01-01', 1, 1, 1),
('Performance 2', '1981-01-01', 2, 1, 1),
('Performance 3', '1981-01-01', 3, 1, 1),
('Performance 4', '1981-01-05', 2, 2, 1),
('Performance 5', '1981-01-08', 2, 3, 1),
('Performance 6', '1982-01-01', 1, 1, 2),
('Performance 7', '1982-01-01', 2, 1, 2),
('Performance 8', '1982-01-02', 3, 1, 2),
('Performance 9', '1982-01-05', 2, 2, 2),
('Performance 10', '1982-01-08', 2, 3, 2);
I’ve been doing the heavy lifting in Python, but the code is quite slow and involves lots of queries. I’ve been wondering if any of you has ideas to combine the performances in one (or a few) queries?
I have had no success doing this in Postgres so far.
Update:
This is what I’ve finally came up with:
- Find all venue/season/piece/date combinations
- Aggregate the rows with same venue/season/pieces
WITH all_dates AS (
SELECT
ARRAY_AGG(DISTINCT piece_id ORDER BY piece_id) AS piece_ids,
season_id,
venue_id,
date as date
FROM
performances
WHERE
season_id IS NOT NULL AND
venue_id IS NOT NULL AND
piece_id IS NOT NULL AND
date IS NOT NULL
GROUP BY
venue_id,
season_id,
date
)
SELECT DISTINCT
season_id,
venue_id,
piece_ids,
ARRAY_AGG(DISTINCT all_dates.date ORDER BY date) AS dates
FROM all_dates
GROUP BY season_id, piece_ids, venue_id
2
Answers
The following query using STRING_AGG shall get you the expected output.
Here is the working DBFIDDLE
Maybe this can help you: