skip to Main Content

I’m looking for a value in Postgres similar to "-infinity"/"infinity", but for string values, which can be used in a WHERE clause, for example, WHERE "name" > 'infinity', which should always return false. Ideally, this value would work regardless of collation or locale settings.

I’ve looked through the Postgres documentation, but it’s very long, so I’m not confident that I just missed something or used the wrong search terms.

My need for this value is specifically to get around some problematic behavior I’m experiencing with NULL string values. Essentially, I’m trying to mimic the "NULLS LAST" behavior of ORDER BY, but in a WHERE clause.

Background

I am trying to implement keyset pagination in Postgres with support for multi-column, non unique indexes, using Row Constructor Comparison.

Here is an example table I am testing with:

CREATE table (
   id text PRIMARY KEY,
   name text 
);

And a corresponding query:

SELECT *
FROM table
WHERE ("name", "id") > ($1, $2)
ORDER BY "name", "id"
LIMIT 5

This almost works, but the query never returns any rows with a null "name".

Additionally, I can’t simply add OR "name" IS NULL to the end of the WHERE clause, because then the query always returns all the null rows, and it is not possible to paginate through several pages of results with null names, but distinct ids.

Another question offered a simple solution to the problem, use coalesce to convert NULL values to an arbitrary string larger than my data ("ZZZZZZZ").

SELECT *
FROM table
WHERE (COALESCE("name", 'ZZZZZZZ'), "id") > (COALESCE($1, 'ZZZZZZZ'), $2)
ORDER BY "name", "id"
LIMIT 5

This solution works pretty well–the row comparison semantics are correct for pagination, and the query is very simple to construct dynamically–but it seems impossible to select an appropriate "largest" string for my data.

For an arbitrary length text column, there is no "largest" literal string I can insert here, so there are always inputs that could cause my query to fail. Additionally, the "last" character I need to use depends on the locale and collation options configured on the database. I’m sure that in my real data, ‘ZZZZZZZ’ would be a terrible choice.

2

Answers


  1. To achieve a behavior similar to -infinity or infinity for string values in PostgreSQL while working around the issue of NULL string values and maintaining consistent pagination, you can use a combination of PostgreSQL features. Since you’re looking for a string that can represent a value greater than any possible string in your dataset regardless of collation, a common approach is to use the string that represents the largest possible value in PostgreSQL.

    To achieve a behavior similar to -infinity or infinity for string values in PostgreSQL while working around the issue of NULL string values and maintaining consistent pagination, you can use a combination of PostgreSQL features. Since you’re looking for a string that can represent a value greater than any possible string in your dataset regardless of collation, a common approach is to use the string that represents the largest possible value in PostgreSQL.

    Suggested Solution
    One way to handle this is to use the E’uFFFF’ escape sequence, which represents the highest Unicode code point. This can be a good candidate for the "maximum" string value as it is unlikely to occur in any textual data. Here’s how you can implement this:

    Use COALESCE with the highest Unicode character:

    You can modify your COALESCE function to replace NULL with the highest Unicode character as follows:

    SELECT *
    FROM table
    WHERE (COALESCE("name", E'\uFFFF'), "id") > (COALESCE($1, E'\uFFFF'), $2)
    ORDER BY "name", "id"
    LIMIT 5;
    

    E’uFFFF’: This represents a character that is greater than any
    regular string character. It effectively mimics the behavior of a
    maximum string value that is unlikely to collide with your actual
    data, even across different collations or locales.

    COALESCE: This function will convert NULL values to E’uFFFF’,
    ensuring that rows with NULL values in the name column will be treated
    as if they are greater than any non-NULL value during the comparison.

    Locale-Agnostic: Since E’uFFFF’ is a Unicode escape, it should behave consistently regardless of your database’s collation or locale settings.

    Flexible Pagination: By using this approach, you can paginate effectively without having to worry about specific string limits or characters that may not work universally across different inputs.

    Login or Signup to reply.
  2. I don’t think there is any such string value. However, I found that Postgres puts NULLs last when comparing array elements (determined experimentally), so you can use

    SELECT *
    FROM table
    WHERE (ARRAY["name"], "id") > (ARRAY[$1::text], $2::int)
    ORDER BY "name" NULLS LAST, "id"
    LIMIT 5
    

    A maybe better (more straightforward and standards-compliant) solution is to introduce a third element in your row that has a higher precedence than the nullable one:

    SELECT *
    FROM table
    WHERE ("name" IS NULL, COALESCE("name", ''), "id") > ($1::text IS NULL, COALESCE($1::text, ''), $2::int)
    ORDER BY "name" NULLS LAST, "id"
    LIMIT 5
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search