My table has some null and empty strings, these look ugly when I query the table so I want to replace them with other values. I don’t own the data so I can’t modify the table itself, this needs to happen when I query.
I tried using regexp_replace
to replace the empty strings with regexp_replace(column, '^$', '(no value found/some other custom message)'
which didn’t work
3
Answers
null
and empty string have to be treated separately. In addition, there’s a difference between''
(empty) and' '
(blank).null
is "unknown".regexp_replace(null, '^$', 'default')
doesn’t do anything because the value it’s matching is unknown. The match fails and it returnsnull
.Instead, use
coalesce
. It returns the first non-null value.coalesce(thing, 'default')
.To capture empty and blank strings, search for
^s*$
.regexp_replace(thing, '^s*$', 'default')
.We put them together like so…
If
thing
isnull
,regexp_replace
will returnnull
, andcoalesce
will return'default'
.If
thing
is empty or blank,regexp_replace
will returndefault
andcoalesce
will returndefault
.If
thing
is none of those,regexp_replace
will returnthing
andcoalesce
will returnthing
.An alternative that might be easier to understand is:
If you like, you can put this into a function. We can declare it
immutable
(the same arguments always produce the same result) potentially giving it a performance boost.Demonstration in PostgreSQL. Redshift is derived from PostgreSQL so it should work the same.
Convert null to blank using
coalesce()
before applying your regexp_replace:With Redshift you can also use
nvl()
instead ofcoalesce()
for a briefer solution, butcoalesce()
is the SQL standard and therefore arguably more readable.This is the fastest way to replace null and empty strings:
Only applicable to string types. Most other data types have no "empty" value.
This uses only a very basic
CASE
expression, so it works in any RDBMS (certainly including Postgres and Redshift) that handles null values according to the SQL standard.Another standard SQL way:
See:
Regular expressions are way to expensive for the simple task (IMO).
If you have to deal with non-printing characters, you need to do more. See: