skip to Main Content

I am working with a MySQL database and I have a column (let’s call it status_column) which contains comma-separated values. The possible values in this column are ‘NA’, ‘NON_NA’, ‘PSG_REQUIRED’, or any combination of these, such as ‘NA,PSG_REQUIRED’ or ‘PSG_REQUIRED,NON_NA’, etc.

I need to write efficient queries to retrieve rows based on the following conditions:

  • Rows where status_column contains ‘NA’ but not ‘NON_NA’.
  • Rows where status_column contains ‘NON_NA’.
  • Rows where status_column contains both ‘NA’ and ‘NON_NA’.

It’s important to note that I want to exclude any rows where status_column may be NULL or an empty string, even when I’m looking for both ‘NA’ and ‘NON_NA’.

I’ve tried using the FIND_IN_SET() function in MySQL, which requires me to use two separate conditions for the third case (matching both ‘NA’ and ‘NON_NA’). Here are my current queries:

For condition 1:

SELECT * FROM my_table
WHERE FIND_IN_SET('NA', status_column) > 0;

For condition 2:

SELECT * FROM my_table
WHERE FIND_IN_SET('NON_NA', status_column) > 0;

For condition 3:

SELECT * FROM my_table
WHERE FIND_IN_SET('NA', status_column) > 0
AND FIND_IN_SET('NON_NA', status_column) > 0;

These queries work, but I am wondering if there is a more optimized way to achieve the same results, especially for the third condition where I am currently using two FIND_IN_SET() calls.

I also considered using the REGEXP function as follows:

SELECT * FROM my_table
WHERE status_column REGEXP '(^|,)NA(,|$)|(,|^)NON_NA(,|$)';

However, I’m unsure if this is the most efficient approach or if there are any better alternatives that I might be missing.

Could someone suggest a more optimal query or confirm if the REGEXP approach is the best way to handle this in MySQL?

2

Answers


  1. Rows where status_column contains ‘NA’ but not ‘NON_NA’:

    SELECT *
    FROM your_table
    WHERE FIND_IN_SET(‘NA’, status_column) > 0
    AND FIND_IN_SET(‘NON_NA’, status_column) = 0
    AND status_column IS NOT NULL
    AND status_column != ”;

    Rows where status_column contains ‘NON_NA’:

    SELECT *
    FROM your_table
    WHERE FIND_IN_SET(‘NON_NA’, status_column) > 0
    AND status_column IS NOT NULL
    AND status_column != ”;

    Rows where status_column contains both ‘NA’ and ‘NON_NA’:

    SELECT *
    FROM your_table
    WHERE FIND_IN_SET(‘NA’, status_column) > 0
    AND FIND_IN_SET(‘NON_NA’, status_column) > 0
    AND status_column IS NOT NULL
    AND status_column != ”;

    Login or Signup to reply.
  2. You can use index on expression.
    This expression can produce bit mask for values ‘NA’,’NON_NA’,’PSG_REQUIRED’
    Search (WHERE FIND_IN_SET(‘NA’, status_column) > 0) condition my convert to IN(....).

    If you carefully write out the expressions when creating the index and search conditions, the query will use the Index range scan.

    See example

    Test data

    create table test (id int,status_column varchar(100));
    insert into test values
     (1,'NA,NON_NA,PSG_REQUIRED') -- 7
    ,(2,'NA,PSG_REQUIRED')        -- 5  
    ,(3,'NON_NA,PSG_REQUIRED')    -- 6
    ,(4,'PSG_REQUIRED')           -- 4
    ,(5,'NA')                     -- 1
    ,(6,'NON_NA')                 -- 2
    ,(7,'NON_NA,NA')              -- 3
    ;
    create index ix_keys on test
      ((case when FIND_IN_SET('NA', status_column) > 0 then 1 else 0 end
       +case when FIND_IN_SET('NON_NA', status_column) > 0 then 2 else 0 end
       +case when FIND_IN_SET('PSG_REQUIRED', status_column) > 0 then 4 else 0 end
      )
      );
    

    Query

    select * 
      ,(case when FIND_IN_SET('NA', status_column) > 0 then 1 else 0 end
       +case when FIND_IN_SET('NON_NA', status_column) > 0 then 2 else 0 end
       +case when FIND_IN_SET('PSG_REQUIRED', status_column) > 0 then 4 else 0 end
      ) fv
    from test
    where (case when FIND_IN_SET('NA', status_column) > 0 then 1 else 0 end
       +case when FIND_IN_SET('NON_NA', status_column) > 0 then 2 else 0 end
       +case when FIND_IN_SET('PSG_REQUIRED', status_column) > 0 then 4 else 0 end
      ) 
     -- in (1,3,5) -- FIND_IN_SET('NA', status_column) > 0
     -- in (1,5) --  'NA' and Not 'NON_NA'
      in (1,2,3) --  'NA' or 'NON_NA' --> Not 'PSG_REQUIRED'
    order by id
    ;
    

    Execute plan

    EXPLAIN
    -> Filter: ((((case when (find_in_set(_utf8mb4'NA',status_column) > 0) then 1 else 0 end) + (case when (find_in_set(_utf8mb4'NON_NA',status_column) > 0) then 2 else 0 end)) + (case when (find_in_set(_utf8mb4'PSG_REQUIRED',status_column) > 0) then 4 else 0 end)) in (1,3,5))  (cost=2.11 rows=3) (actual time=0.0219..0.0395 rows=3 loops=1)
        -> Index range scan on test using ix_keys over ((((case when (find_in_set(_utf8mb4'NA',`status_column`) > 0) then 1 else 0 end) + (case when (find_in_set(_utf8mb4'NON_NA',`status_column`) > 0) then 2 else 0 end)) + (case when (find_in_set(_utf8mb4'PSG_REQUIRED',`status_column`) > 0) then 4 else 0 end)) = 1) OR ((((case when (find_in_set(_utf8mb4'NA',`status_column`) > 0) then 1 else 0 end) + (case when (find_in_set(_utf8mb4'NON_NA',`status_column`) > 0) then 2 else 0 end)) + (case when (find_in_set(_utf8mb4'PSG_REQUIRED',`status_column`) > 0) then 4 else 0 end)) = 3) OR ((((case when (find_in_set(_utf8mb4'NA',`status_column`) > 0) then 1 else 0 end) + (case when (find_in_set(_utf8mb4'NON_NA',`status_column`) > 0) then 2 else 0 end)) + (case when (find_in_set(_utf8mb4'PSG_REQUIRED',`status_column`) > 0) then 4 else 0 end)) = 5)  (cost=2.11 rows=3) (actual time=0.0174..0.0343 rows=3 loops=1)
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search