skip to Main Content

Why I get so many more records with a condition in sql?

I try to build a sql table based on another sql table. The original commands are:

cursor.execute('DELETE FROM Dataplace.Queue')
cursor.execute(f"INSERT INTO Dataplace.Queue SELECT DISTINCT COSTCENTER, 'Pending', Null, Null FROM Dataplace.START")

Where the start table has the following structure:

Calmonth COSTCENTER    NAME VALUE
202301    0002001234    A    10
202302    0002001234    A    20
202301    0002001234    B    15
202301    0001166666    A    12
202301    0001000666    A    11
...

And with the original command i get 378 entries, with the output for the Queue Table:

Key         Status  Container Duration
0002001234  Pending NULL     NULL
0001166666  Pending NULL     NULL
0001000666  Pending NULL     NULL

When I run the commands:

cursor.execute('DELETE FROM Dataplace.Queue')
cursor.execute(f"INSERT INTO Dataplace.Queue SELECT DISTINCT COSTCENTER, 'Pending', Null, Null FROM Dataplace.START WHERE NOT (COSTCENTER LIKE '000200%' OR COSTCENTER LIKE '001000%')")

I get an output with 577 rows. I would expect an output less then above (378-84 see the following command). What do I miss here?
The commands:

cursor.execute('DELETE FROM Dataplace.Queue')
cursor.execute(f"INSERT INTO Dataplace.Queue SELECT DISTINCT COSTCENTER, 'Pending', Null, Null FROM Dataplace.START WHERE COSTCENTER LIKE '000200%' OR COSTCENTER LIKE '001000%'")

gives a output with 84 rows.

2

Answers


  1. The discrepancy in the number of records you’re observing is likely due to the DISTINCT keyword and how the WHERE clause is filtering your data.

    When you use DISTINCT, it ensures that all rows in the result set are unique. In your first INSERT statement, you are selecting distinct COSTCENTER values from the entire Dataplace.START table without any filtering. This gives you 378 unique COSTCENTER values.

    In your second INSERT statement, you apply a filter that excludes rows where COSTCENTER starts with ‘000200’ or ‘001000’. Intuitively, you might expect this to reduce the number of rows in the result. However, the actual outcome depends on the distribution of COSTCENTER values in your table. If, after excluding these specific COSTCENTER values, there are more unique COSTCENTER values left in the data than you had initially, you can end up with more records, not fewer.

    For the third INSERT statement, you’re explicitly filtering to include only COSTCENTER values that start with ‘000200’ or ‘001000’. This results in 84 rows, which indicates that there are 84 unique COSTCENTER values in your table that match this condition.

    The key point to understand is that the number of distinct COSTCENTER values that do not start with ‘000200’ or ‘001000’ can be greater than the total number of distinct COSTCENTER values in your table. This can happen if there are many unique COSTCENTER values that don’t match these patterns.

    To troubleshoot and understand your data better, you might want to check the distinct COSTCENTER values in your table and see how they are distributed. You can execute a query to count the number of unique COSTCENTER values that match and do not match your criteria:

    SELECT COUNT(DISTINCT COSTCENTER) FROM Dataplace.START WHERE COSTCENTER LIKE '000200%' OR COSTCENTER LIKE '001000%';
    SELECT COUNT(DISTINCT COSTCENTER) FROM Dataplace.START WHERE NOT (COSTCENTER LIKE '000200%' OR COSTCENTER LIKE '001000%');
    

    This will give you a clearer picture of how your COSTCENTER values are distributed and why the result set sizes are different.

    Login or Signup to reply.
  2. It is really hard to determine the correct query to execute without the Dataset, but I’ll try anyway:

    If I understood, it’s just a matter of conditions and operators.

    Doing a

    [...] WHERE NOT (a LIKE 'x' OR b LIKE 'y')
    

    will basically return everything.
    Instead:

    [...] WHERE (a NOT LIKE 'x' AND b NOT LIKE 'y')
    

    should return the correct result.

    You need to retrieve the COSTCENTERs that don’t contain '000200%'
    AND '001000%'.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search