skip to Main Content

We are using MariaDB 10.5.13 but the same issue occurs on native MySQL 8.

Getting this error

4003 – Too many WITH elements in WITH clause

We have a web application which allows users to search a database containing approx 290,000 substances. A feature of the application is that substances can appear on different chemical inventories and the user can find substances by inventory. The inventories themselves are referred to as filters in the database.

There are 3 tables holding the relevant data:

  • substances: the list of 290,000 substances. Each has a unique id.
  • filters: the list of different chemical inventories. Approx 2300 rows. Each has a unique id.
  • filters_substances: A table which maps substances and filters together. The relationship is such that any individual substance may appear on >=1 filters (i.e. any given substance can be on 1 or more chemical inventories). Approx 1,662,873 rows. The foreign keys used are filter_id and substance_id to refer to filters and substances respectively.

When the user applies filters there are 2 possible operations alongside each filter.

  1. Find the substances that are on a given inventory.
  2. Exclude any substances that are on that inventory.

Find and Exclude can be used in combination. For example the user could do:

  • Find substances on filters 2 and 300
  • Exclude any substances on filters 5

The result would be substances that are on inventories 2 and 300, but with any substances on 5 removed from the results.

Or they can just use Find on its own, e.g.

  • Find substances on filters 2 and 300

There is no use-case for Exclude on its own.

When the user applies filter conditions we write the substances found, substances.id, to a table. Every user has their own table to store results which is suffixed with their user ID, e.g. "filtering_1745" is the table for user ID 1745.

The problem

This has worked without issue but we recently found when >60 filters were applied we got this error

Too many WITH elements in WITH clause

The SQL which was generated by the application looks like this. For brevity I’ve removed some of it but added comments to indicate what the removed portions would be:

create table if not exists filtering_1745 
as with 
    fs196 as (select substance_id from filters_substances where filter_id = 196), 
    fs197 as (select substance_id from filters_substances where filter_id = 197), 
    fs198 as (select substance_id from filters_substances where filter_id = 198),
    -- Same as above for ~60 other filters 

    select distinct substance_id as `id` from 
    (
        (
        select substance_id from fs196 
        union 
        select substance_id from fs197 
        union 
        select substance_id from fs198 
        -- Same as above for ~60 other filters
        )
    ) 
sq;

In this case all conditions in the query are "Find" (it doesn’t use any "Exclude" conditions). If we were also using Exclude conditions then the only difference is that an EXCEPT statement is appended to the query. For instance this would exclude filters 170, 178, 180.

 except 
 (
     select substance_id from fs170 
     union select substance_id from fs178 
     union select substance_id from fs180 
 ) sq;

The same error message appears in this scenario. In other words it’s the number of filters being used as opposed to whether they are Find or Exclude as such.

I’m looking for advice on how this could potentially be rewritten.

One thing I considered is along the lines of this.

  • Running ~60 separate SELECT statements to get the appropriate substance IDs for the "Find" filters.
  • (Optional) if "Exclude" is being used DELETE any substances that are NOT IN the filters that are being excluded.
  • DELETE any non-unique substance IDs in filtering_1745. This is to make sure when the results are presented to a user later it only includes unique substances. If for example a given substance appeared on 5 filters it should only be presented once because it’s a unique substance.

Any advice would be welcomed.

I looked at some other posts such as "IN" clause limitation in Sql Server and How to put more than 1000 values into an Oracle IN clause but I believe these are different problems. I couldn’t find anything equivalent to this but if there is something please comment with a link and I’ll take a look.

2

Answers


  1. When the user applies filters there are 2 possible operations alongside each filter.

    • Find the substances that are on a given inventory.
    • Exclude any substances that are on that inventory.

    Pattern:

    SELECT substance_id
    FROM filters_substances
    GROUP BY substance_id
    HAVING MAX(filter_id = @filter1)   -- include
       AND MAX(filter_id = @filter1)   -- include
       AND NOT MAX(filter_id = @filter3)  -- exclude
       AND NOT MAX(filter_id = @filter4)  -- exclude
    -- ...
    

    Then join substances table and receive their details.

    If mentioned filter is present in current row then according filter_id = @filterN produces 1, if not then it produces 0.

    MAX() returns 1 if filter is present in at least one of the rows for the substance (see GROUP BY), and returns 0 if none row for this substance contains this filter.

    NOT inverts the output of MAX().

    So the substance will be returned only when all filters which should be present returns 1 and all filters which should not be present returns 0 accordingly.

    This query may be used either as CTE, or the substances table may be added into FROM with according output list and GROUP BY expressions expanding.

    Login or Signup to reply.
  2. I would offer a suggestion to have a temporary table that may be specific to each user that you can insert all the filters or exclusions and join against that. This way, you are not limited to 60 or even 1000 entries. Now, not knowing the language you are using to query the database, you can resolve that. But lets start with a temp table per user just as example with applicable index per ID.

    Create Table TempFilters_1745
    ( filter_id integer )
    
    Create Table TempSubstances_1745
    ( substance_id integer )
    

    Then, instead of building a loop for all the filters or substances, just insert

    insert into TempFilter_1745 ( filter_id )
       values ( 196 ), ( 197 ), (198 )  ....as many as needed
    

    If exclusions

    insert into TempSubstances_1745 ( substance_id )
       values ( 42 ), ( 57 ) ....as many as needed
    

    Then query based on these two temp tables

    create table if not exists filtering_1745 as
    select distinct
          fs.substance_id
       from
          filters_substances fs
             JOIN TempFilter_1745 tmp
                on fs.filter_id = tmp.filter_id
       where
          fs.substance_id 
             NOT IN ( select substance_id
                         from TempSubstances_1745 )
    

    Now, if you have no "EXCEPTION" substances to be excluded, that temp table would have no records in it and thus return everything. But if there WERE records, they would be excluded. The query itself doesn’t change. You just populate the temp filters or temp substances.

    Now, you could take this a slightly different step further instead of multiple temp tables, you could expand and include the user’s ID as a secondary column (and include that ID with your inserts and index as well)

    Create Table TempFilters
    ( user_id integer, filter_id integer )
    
    Create Table TempSubstances
    ( user_id integer, substance_id integer )
    
    
    insert into TempFilter ( user_id, filter_id )
       values ( 1745, 196 ), ( 1745, 197 ), (1745, 198 )  
          ....as many as needed
    
    insert into TempSubstances ( user_id, substance_id )
       values ( 1745, 42 ), ( 1745, 57 ) 
           ....as many as needed
    
    create table if not exists filtering_1745 as
    select distinct
          fs.substance_id
       from
          filters_substances fs
             JOIN TempFilter tmp
                on tmp.user_id = 1745 
               AND fs.filter_id = tmp.filter_id
       where
          fs.substance_id 
             NOT IN ( select ts.substance_id
                         from TempSubstances ts
                         where ts.user_id = 1745 )
    

    At least this can open your mind to another possible solution.

    You can even take the second option a step further by having some additional table allowing a user to save filter sets such as "My XYZ Filter set", that has an auto-increment ID. Add that ID to the filter/substance tables and include that as part of the inserts. Then, when a user wants to adjust, they dont have to re-pick the same set of possible 100’s of values again, miss something and do again. I have no idea of your user interface, just a thought. If not doing based on some "criteria set" ID, make sure you pre-delete the already populated list for each user before starting a new query.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search