skip to Main Content

Can someone help me understand what this SQL query is doing?

select region_code, 
       ST_Area(geom::geography) as area,
       ST_X(ST_Centroid(geom)) as long, 
       ST_Y(ST_Centroid(geom)) as lat, 
       latitude, 
       longitude 
from regions 
where regexp_split_to_array(parent, E'\\.') && %s 
or region_code in %s

More specifically I’m confused about the where condition:

where regexp_split_to_array(parent, E'\\.') && %s 
or region_code in %s

I looked up regexp_split_to_array here, https://www.sqliz.com/postgresql-ref/regexp_split_to_array/. From what I can tell it splits the parent to an array based on the regex E'\\.'.

My understanding of . is it will split at every . but what happens when you have \\?

Suppose the parent field is aaa.bbb.ccc.ddd what is the regex supposed to do?

I’ve tried running:

select name, parent, region_code from regions where regexp_split_to_array(parent, E'\\.');

but I get the error:

ERROR:  argument of WHERE must be type boolean, not type text[]
LINE 1: ...lect name, parent, region_code from regions where regexp_spl...

So I modified it to:

select name, parent, region_code from regions where regexp_split_to_array(parent, E'\\.') && Array['something_here'];

and got this:

 name | parent | region_code
------+--------+-------------
(0 rows)

Can someone explain to me what exactly the where clause is doing? Thank you.

4

Answers


  1. The expression splits a string by two characters, the first of which is a backslash, e.g.

    select regexp_split_to_array('aa?bb!ccddd', E'\\.')
    
     regexp_split_to_array
    -----------------------
     {aa,bb,cc,dd}
    (1 row)
    

    Note that E'\\.' is equivalent to '\.' (though the former looks more professional).

    select regexp_split_to_array('aa?bb!ccddd', '\.');
    
     regexp_split_to_array
    -----------------------
     {aa,bb,cc,dd}
    (1 row)
    

    It might seem the expression results from the overwork of a developer who wanted to achieve something else. However, if the query is a python string then it is escaped when used in execute() and translated to the proper

    select regexp_split_to_array('aa.bb.cc.dd', E'\.')
    

    which splits the first arguments by a period. You can try this:

    import psycopg2
    
    conn = psycopg2.connect(
        host="localhost",
        database="test",
        user="xxx",
        password="xxx")
    
    query = "select regexp_split_to_array('aa.bb.cc.dd', E'\\.')"
    
    with conn.cursor() as cur:
        cur.execute(query)
        print(cur.fetchone()[0])
    

    The result is:

    ['aa', 'bb', 'cc', 'dd']
    
    Login or Signup to reply.
  2. E'...' is a string constant with C-Style escapes meaning it will interpret n as a newline and so forth. Anything with a leading is escaped. \ means . . means ..

    So E'\\.' is a tortured way to write '\.'

    Regexes also use as an escape character. So '\.' to a regex means to match a literal and then a any single character.

    regexp_split_to_array(parent, E'\\.') is just regexp_split_to_array(parent, '\.'). It is splitting on <any character>.

    'this-and-that' will be split into {'this', 'and', 'that'}.

    Demonstration.

    Login or Signup to reply.
  3. You should move the function into the select list, if you want the results of it:

    select name, parent, region_code, regexp_split_to_array(parent, E'\\.')
       from regions;
    

    The construct E'\\.' creates the same thing as the simpler '\.', because with E one backslash escapes the next one. Then in regex engine, again one backslash escapes the next one, so this overall thing splits on a literal backslash followed by any character, and swallows that "any character". Whether this is what the person who wrote that code intended it to do, I can’t say.

    Login or Signup to reply.
  4. Telling from %s in the query, I believe this is not a direct sql code from sql terminal, but within a programming language wrapper, I guess it’s probably wrapped by Python.

    The regex pattern you are looking for is split ‘aaa.bbb.ccc.ddd’ by ‘.’,

    Step 1. raw sql code on regexp_split_to_array() should be

    regexp_split_to_array(parent,'.')
    

    Step 2. PostgreSQL accept escape string directly with E prefix, the escaped string with E prefix in raw sql code should be

    regexp_split_to_array(parent,E'\.')
    

    Step 3 (optional). If your code is in python wrapper, the code should be further escaped.

    regexp_split_to_array(parent,E'\\\.')
    

    See SQLFIDDLE LINK

    enter image description here

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search