skip to Main Content

Update: I’ve updated the test string to cover a case that I’ve missed.

I’m trying to do count the number of WHERE filters in a query using regex.

So the general idea is to count the number of WHERE and AND occuring in the query, while excluding the AND that happens after a JOIN and before a WHERE. And also excluding the AND that happens in a CASE WHEN clause.

For example, this query:

WITH cte AS (nSELECT a,bnFROM somethingnWHERE a>10n AND b<5)n, cte2 AS (n SELECT c,nd FROM anothernWHERE c>10nAND d<5)n SELECT CASE WHEN c1.a=1nAND c2.c=1 THEN 'yes' ELSE 'no' nEND,c1.a,c1.b,c2.c,c2.dnFROM cte c1nINNER JOIN cte2 c2 ON c1.a = c2.cnAND c1.b = c2.dnWHERE c1.a<4 AND DATE(c1)>'2022-01-01'nAND c2.c>6

-- FORMATTED FOR EASE OF READ. PLEASE USE LINE ABOVE AS REGEX TEST STRING
WITH cte AS (
  SELECT a,b 
  FROM something 
  WHERE a>10 
    AND b<5
)

, cte2 AS (
  SELECT c,d
  FROM another
  WHERE c>10
    AND d<5
)

SELECT
  CASE
      WHEN c1.a=1 AND c2.c=1 THEN 'yes'
      WHEN c1.a=1 AND c2.c=1 THEN 'maybe'
      ELSE 'no'
  END,
  c1.a,
  c1.b,
  c2.c,
  c2.d
FROM cte c1
INNER JOIN cte2 c2
   ON c1.a = c2.c
  AND c1.b = c2.d
WHERE c1.a<4
  AND DATE(c1)>'2022-01-01'
  AND c2.c>6

should return 7, which are:

  1. WHERE a>10
  2. AND b<5
  3. WHERE c>10
  4. AND d<5
  5. WHERE c1.a<4
  6. AND DATE(c1)>'2022-01-01'
  7. AND c2.c>6

The portion AND c1.b = c2.d is not counted because it happens after JOIN, before WHERE.

The portion AND c2.c=1 is not counted because it is in a CASE WHEN clause.

I eventually plan to use this on a Postgresql query to count the number of filters that happens in all queries in a certain period.

I’ve tried searching around for answer and trying it myself but to no avail. Hence looking for help here. Thank you in advanced!

2

Answers


  1. You can try something like this:

    WITH DataSource (parts) AS 
    (  
      SELECT REGEXP_MATCHES(
        'WITH cte AS (SELECT a,b FROM something WHERE a>10 AND b<5)n, cte2 AS (SELECT c,d FROM another WHERE c>10 AND d<5)n SELECT c1.a,c1.b,c2.c,c2.d FROM cte c1 INNER JOIN cte2 c2 ON c1.a = c2.c AND c1.b = c2.d WHERE c1.a<4 AND c2.c>6',
        E'(?= WHERE)[^)|;]+'
      ,'gmi'
      ) 
    )
    SELECT SUM
           (
               (length(parts[1]) - length(REPLACE(parts[1], 'AND', ''))) / 3 -- counting ANDs
                + 1 -- for the where
          )
    FROM DataSource
    

    The idea is to match the text after WHERE clause:

    enter image description here

    and then simply count the ANDs and add one because of the matched WHERE.

    Login or Signup to reply.
  2. I try to stay away from lookarounds as they could be messy and too painful to use, especially with the fixed-width limitation of lookbehind assertion.

    My proposed solution is to capture all scenarios in different groups, and then select only the group of interest. The undesired scenarios will still be matched, but will not be selected.

    1. Group 1 – Starts with JOIN (undesired)
    2. Group 2 – Starts with WHERE (desired)
    3. Group 3 – Starts with CASE (undesired)
    (JOIN.*?(?=$|WHERE|JOIN|CASE|END))|(WHERE.*?(?=$|WHERE|JOIN|CASE|END))|(CASE.*?(?=$|WHERE|JOIN|CASE|END))
    

    Note: Feel free to replace WHERE|JOIN|CASE|END to any keyword you want to be the ‘stopper’ words.
    enter image description here

    All scenarios including the undesired ones will be matched, but you need to select only Group 2 (highlighted in orange).

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search