skip to Main Content

The goal is a custom aggregate function made with CREATE AGGREGATE called string_agg_oxford; it is an aggregate function that works similar to string_agg except it is smart enough to know how many items it is aggregating so that it can place "and" in front of the last item.

So where string_agg(items, ', ') would return "item1, item2, item3", string_agg_oxford(items) will return "item1, item2, and item3".

My failed attempt starts with a type for our accumulator that includes the total number of rows and the index for the current row:

CREATE TYPE oxford_accumulator as (
  row_count numeric,
  i numeric,
  acc text
);

Now we need our accumulator function:

CREATE OR REPLACE FUNCTION oxford_acc (acc oxford_accumulator, curr text)
  RETURNS oxford_accumulator
  LANGUAGE PLPGSQL
  AS $$
BEGIN
  IF acc.i + 1 = acc.row_count THEN
    RETURN (acc.row_count, acc.i + 1, acc.acc || curr);
  END IF;

  IF (acc.i + 2 = acc.row_count) AND (acc.row_count = 2)  THEN
    RETURN (acc.row_count, acc.i + 1, acc.acc || curr || ' and ');
  END IF;

  IF (i + 2 = acc.row_count) THEN
    RETURN (acc.row_count, acc.i + 1, acc.acc || curr || ', and ');
  END IF;

  RETURN (acc.row_count, acc.i + 1, acc.acc || curr || ', ');
END;
$$;

because the accumulator has swallowed up the total count and the index we have to release this information when the accumulator is finished with an ffunc.

CREATE OR REPLACE FUNCTION oxford_final (acc oxford_accumulator)
  RETURNS text
  LANGUAGE PLPGSQL
  AS $$
BEGIN
  RETURN acc.acc;
END;
$$;

My idea falls apart here where we need to wire it all up because there does not seem to be a way to parametrize the total row count… so fail.

CREATE OR REPLACE AGGREGATE string_agg_oxford (text, row_count numeric) (
  INITCOND = (row_count, 0, ''),
    --         ^^^ fail
  STYPE = oxford_accumulator,
  SFUNC = oxford_acc,
  FINALFUNC = oxford_final
);

I know something similar can be achieved with a regular function, but I’m not ready to give up yet if there’s a way to do this as an aggregator that could be used in a select statement like SELECT string_agg_oxford(clients.full_name) FROM matters GROUP BY matters.matter_id;

2

Answers


  1. Chosen as BEST ANSWER
    1. Initial condition must be "a string constant in the form accepted for the data type state_data_type" so no way to pass a dynamic value. Credit to @a_horse_with_no_name

    2. For efficiency reasons, it would be easier to "peel the last value out of the accumulator." Credit to @jjanes.

    3. Lets use a unique separator to alleviate any issues matching on ", " because that is fairly common - so we'll use a unique separator and then replace those in the ffunc as is appropriate.

    4. Postgres does not support having capture groups within its lookahead assertions, so instead of a fancy regular expression to find the last occurrence of our unique separator, we will reverse the string and tackle things upside down.

    CREATE OR REPLACE FUNCTION oxford_acc (acc text, curr text)
      RETURNS text
      LANGUAGE PLPGSQL
      AS $$
    BEGIN
      RETURN acc || curr || '@$@$';
      --                     ^^^ unique separator, can be anything unique
    END;
    $$;
    
    CREATE OR REPLACE FUNCTION oxford_final (acc text)
      RETURNS text
      LANGUAGE PLPGSQL
      AS $$
    DECLARE
        my_result text;
        my_counter numeric;
    BEGIN
        SELECT left(acc, -4) INTO my_result;
        -- ^^^ removes the last separator
        SELECT count(*) FROM regexp_matches(my_result, '@$@$', 'g') INTO my_counter;
        
        IF my_counter = 0 THEN
          RETURN my_result;
        END IF;
        
        IF my_counter = 1 THEN
          RETURN regexp_replace(my_result, '@$@$', ' and ');
        END IF;
        
        SELECT reverse(my_result) INTO my_result;
        
        SELECT regexp_replace(my_result, '$@$@', ' dna ,') INTO my_result;
        
        SELECT reverse(my_result) INTO my_result;
        
        RETURN regexp_replace(
          my_result, 
          '@$@$',
          ', ',
          'g'
        );
    END;
    $$;
    
    CREATE OR REPLACE AGGREGATE oxford_agg (text) (
      INITCOND = '',
      STYPE = text,
      SFUNC = oxford_acc,
      FINALFUNC = oxford_final
    );
    

    Suggestions for improvements welcome!


  2. Right, you don’t know you are done until you are done. You would have to make two passes over the data, one to get the count and the other to construct the string. That wouldn’t be very efficient, and there is no obvious way to make PostgreSQL do it that way that could be used as a simple aggregate. I think you could use a window function to get the total count, then an aggregate which takes two arguments (string and total count), but I think you would need to write that part in an "outer query", otherwise it wouldn’t have access to the total count.

    The straightfoward way to do this would be to use the same accumulator and transition function as string_agg(x, ', '), but have your finalizer function just peel the last value out of the accumulator, and tack it back on with an ‘, and ‘.

    Or you could define an accumulator with (acc text, prev text) and have the transition function tack the prev into acc (if prev isn’t null) then store its current value into prev. Then have the finalizer tack the final prev into acc with the ‘, and ‘ separator.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search