I’m looking for a value in Postgres similar to "-infinity"/"infinity", but for string values, which can be used in a WHERE clause, for example, WHERE "name" > 'infinity'
, which should always return false. Ideally, this value would work regardless of collation or locale settings.
I’ve looked through the Postgres documentation, but it’s very long, so I’m not confident that I just missed something or used the wrong search terms.
My need for this value is specifically to get around some problematic behavior I’m experiencing with NULL string values. Essentially, I’m trying to mimic the "NULLS LAST" behavior of ORDER BY
, but in a WHERE clause.
Background
I am trying to implement keyset pagination in Postgres with support for multi-column, non unique indexes, using Row Constructor Comparison.
Here is an example table I am testing with:
CREATE table (
id text PRIMARY KEY,
name text
);
And a corresponding query:
SELECT *
FROM table
WHERE ("name", "id") > ($1, $2)
ORDER BY "name", "id"
LIMIT 5
This almost works, but the query never returns any rows with a null "name".
Additionally, I can’t simply add OR "name" IS NULL
to the end of the WHERE clause, because then the query always returns all the null rows, and it is not possible to paginate through several pages of results with null names, but distinct ids.
Another question offered a simple solution to the problem, use coalesce to convert NULL values to an arbitrary string larger than my data ("ZZZZZZZ").
SELECT *
FROM table
WHERE (COALESCE("name", 'ZZZZZZZ'), "id") > (COALESCE($1, 'ZZZZZZZ'), $2)
ORDER BY "name", "id"
LIMIT 5
This solution works pretty well–the row comparison semantics are correct for pagination, and the query is very simple to construct dynamically–but it seems impossible to select an appropriate "largest" string for my data.
For an arbitrary length text
column, there is no "largest" literal string I can insert here, so there are always inputs that could cause my query to fail. Additionally, the "last" character I need to use depends on the locale and collation options configured on the database. I’m sure that in my real data, ‘ZZZZZZZ’ would be a terrible choice.
2
Answers
To achieve a behavior similar to -infinity or infinity for string values in PostgreSQL while working around the issue of NULL string values and maintaining consistent pagination, you can use a combination of PostgreSQL features. Since you’re looking for a string that can represent a value greater than any possible string in your dataset regardless of collation, a common approach is to use the string that represents the largest possible value in PostgreSQL.
To achieve a behavior similar to -infinity or infinity for string values in PostgreSQL while working around the issue of NULL string values and maintaining consistent pagination, you can use a combination of PostgreSQL features. Since you’re looking for a string that can represent a value greater than any possible string in your dataset regardless of collation, a common approach is to use the string that represents the largest possible value in PostgreSQL.
Suggested Solution
One way to handle this is to use the E’uFFFF’ escape sequence, which represents the highest Unicode code point. This can be a good candidate for the "maximum" string value as it is unlikely to occur in any textual data. Here’s how you can implement this:
Use COALESCE with the highest Unicode character:
You can modify your COALESCE function to replace NULL with the highest Unicode character as follows:
Locale-Agnostic: Since E’uFFFF’ is a Unicode escape, it should behave consistently regardless of your database’s collation or locale settings.
Flexible Pagination: By using this approach, you can paginate effectively without having to worry about specific string limits or characters that may not work universally across different inputs.
I don’t think there is any such string value. However, I found that Postgres puts
NULL
s last when comparing array elements (determined experimentally), so you can useA maybe better (more straightforward and standards-compliant) solution is to introduce a third element in your row that has a higher precedence than the nullable one: