Can someone help me understand what this SQL query is doing?
select region_code,
ST_Area(geom::geography) as area,
ST_X(ST_Centroid(geom)) as long,
ST_Y(ST_Centroid(geom)) as lat,
latitude,
longitude
from regions
where regexp_split_to_array(parent, E'\\.') && %s
or region_code in %s
More specifically I’m confused about the where
condition:
where regexp_split_to_array(parent, E'\\.') && %s
or region_code in %s
I looked up regexp_split_to_array
here, https://www.sqliz.com/postgresql-ref/regexp_split_to_array/. From what I can tell it splits the parent
to an array based on the regex E'\\.'
.
My understanding of .
is it will split at every .
but what happens when you have \\
?
Suppose the parent
field is aaa.bbb.ccc.ddd
what is the regex supposed to do?
I’ve tried running:
select name, parent, region_code from regions where regexp_split_to_array(parent, E'\\.');
but I get the error:
ERROR: argument of WHERE must be type boolean, not type text[]
LINE 1: ...lect name, parent, region_code from regions where regexp_spl...
So I modified it to:
select name, parent, region_code from regions where regexp_split_to_array(parent, E'\\.') && Array['something_here'];
and got this:
name | parent | region_code
------+--------+-------------
(0 rows)
Can someone explain to me what exactly the where clause is doing? Thank you.
4
Answers
The expression splits a string by two characters, the first of which is a backslash, e.g.
Note that
E'\\.'
is equivalent to'\.'
(though the former looks more professional).It might seem the expression results from the overwork of a developer who wanted to achieve something else. However, if the query is a python string then it is escaped when used in
execute()
and translated to the properwhich splits the first arguments by a period. You can try this:
The result is:
E'...'
is a string constant with C-Style escapes meaning it will interpretn
as a newline and so forth. Anything with a leadingis escaped.
\
means.
.
means.
.So
E'\\.'
is a tortured way to write'\.'
Regexes also use
as an escape character. So
'\.'
to a regex means to match a literaland then a any single character.
regexp_split_to_array(parent, E'\\.')
is justregexp_split_to_array(parent, '\.')
. It is splitting on<any character>
.'this-and-that'
will be split into{'this', 'and', 'that'}
.Demonstration.
You should move the function into the select list, if you want the results of it:
The construct
E'\\.'
creates the same thing as the simpler'\.'
, because with E one backslash escapes the next one. Then in regex engine, again one backslash escapes the next one, so this overall thing splits on a literal backslash followed by any character, and swallows that "any character". Whether this is what the person who wrote that code intended it to do, I can’t say.Telling from %s in the query, I believe this is not a direct sql code from sql terminal, but within a programming language wrapper, I guess it’s probably wrapped by Python.
The regex pattern you are looking for is split ‘aaa.bbb.ccc.ddd’ by ‘.’,
Step 1. raw sql code on regexp_split_to_array() should be
Step 2. PostgreSQL accept escape string directly with E prefix, the escaped string with E prefix in raw sql code should be
Step 3 (optional). If your code is in python wrapper, the code should be further escaped.
See SQLFIDDLE LINK