The goal is a custom aggregate function made with CREATE AGGREGATE called string_agg_oxford
; it is an aggregate function that works similar to string_agg
except it is smart enough to know how many items it is aggregating so that it can place "and" in front of the last item.
So where string_agg(items, ', ')
would return "item1, item2, item3"
, string_agg_oxford(items)
will return "item1, item2, and item3"
.
My failed attempt starts with a type for our accumulator that includes the total number of rows and the index for the current row:
CREATE TYPE oxford_accumulator as (
row_count numeric,
i numeric,
acc text
);
Now we need our accumulator function:
CREATE OR REPLACE FUNCTION oxford_acc (acc oxford_accumulator, curr text)
RETURNS oxford_accumulator
LANGUAGE PLPGSQL
AS $$
BEGIN
IF acc.i + 1 = acc.row_count THEN
RETURN (acc.row_count, acc.i + 1, acc.acc || curr);
END IF;
IF (acc.i + 2 = acc.row_count) AND (acc.row_count = 2) THEN
RETURN (acc.row_count, acc.i + 1, acc.acc || curr || ' and ');
END IF;
IF (i + 2 = acc.row_count) THEN
RETURN (acc.row_count, acc.i + 1, acc.acc || curr || ', and ');
END IF;
RETURN (acc.row_count, acc.i + 1, acc.acc || curr || ', ');
END;
$$;
because the accumulator has swallowed up the total count and the index we have to release this information when the accumulator is finished with an ffunc
.
CREATE OR REPLACE FUNCTION oxford_final (acc oxford_accumulator)
RETURNS text
LANGUAGE PLPGSQL
AS $$
BEGIN
RETURN acc.acc;
END;
$$;
My idea falls apart here where we need to wire it all up because there does not seem to be a way to parametrize the total row count… so fail.
CREATE OR REPLACE AGGREGATE string_agg_oxford (text, row_count numeric) (
INITCOND = (row_count, 0, ''),
-- ^^^ fail
STYPE = oxford_accumulator,
SFUNC = oxford_acc,
FINALFUNC = oxford_final
);
I know something similar can be achieved with a regular function, but I’m not ready to give up yet if there’s a way to do this as an aggregator that could be used in a select statement like SELECT string_agg_oxford(clients.full_name) FROM matters GROUP BY matters.matter_id;
2
Answers
Initial condition must be "a string constant in the form accepted for the data type state_data_type" so no way to pass a dynamic value. Credit to @a_horse_with_no_name
For efficiency reasons, it would be easier to "peel the last value out of the accumulator." Credit to @jjanes.
Lets use a unique separator to alleviate any issues matching on ", " because that is fairly common - so we'll use a unique separator and then replace those in the ffunc as is appropriate.
Postgres does not support having capture groups within its lookahead assertions, so instead of a fancy regular expression to find the last occurrence of our unique separator, we will reverse the string and tackle things upside down.
Suggestions for improvements welcome!
Right, you don’t know you are done until you are done. You would have to make two passes over the data, one to get the count and the other to construct the string. That wouldn’t be very efficient, and there is no obvious way to make PostgreSQL do it that way that could be used as a simple aggregate. I think you could use a window function to get the total count, then an aggregate which takes two arguments (string and total count), but I think you would need to write that part in an "outer query", otherwise it wouldn’t have access to the total count.
The straightfoward way to do this would be to use the same accumulator and transition function as
string_agg(x, ', ')
, but have your finalizer function just peel the last value out of the accumulator, and tack it back on with an ‘, and ‘.Or you could define an accumulator with
(acc text, prev text)
and have the transition function tack the prev into acc (if prev isn’t null) then store its current value into prev. Then have the finalizer tack the final prev into acc with the ‘, and ‘ separator.