skip to Main Content

We have Employee ID in over 50 different tables with various data types (varchar, int, text).
When these tables were populated via conversion, they should have been populated with 6 numbers. Unfortunately, some of the IDs had dropped leading zeros or the ID was appended with trailing spaces.
I should mention that all of the tables in question start with z%.
How can I find the TABLES that identify the Employee Ids with a length greater than or less than 6?

This is what my mind is thinking, but it’s obviously not right.

Select LENGTH(EMPLOYEE_ID), TABLE_NAME from (Select TABLE_NAME FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_SCHEMA = 'mySchema' and TABLE_NAME like 'z%') a
where LENGTH(EMPLOYEE_ID) <> 6
GROUP BY  TABLE_NAME, LENGTH(EMPLOYEE_ID)

What I don’t want to do is list out every table like:

Select LENGTH(EMPLOYEE_ID), TABLE_NAME from TABLE1
where LENGTH(EMPLOYEE_ID) <> 6
UNION
Select LENGTH(EMPLOYEE_ID), TABLE_NAME from TABLE2
where LENGTH(EMPLOYEE_ID) <> 6
UNION
Select LENGTH(EMPLOYEE_ID), TABLE_NAME from TABLE3
where LENGTH(EMPLOYEE_ID) <> 6

2

Answers


  1. This might be tricky to do using purely SQL. I would advise using a scripting language like Python and taking advantage of pandas. You can do something like below:

    import pymysql
    import pandas as pd
    
    conn = pymysql.connect(host='host', user='user', password='pass', db='mySchema')
    
    # Fetch the list of tables
    query = "SELECT TABLE_NAME FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_SCHEMA = 'mySchema' AND TABLE_NAME LIKE 'z%'"
    tables = pd.read_sql(query, conn)
    
    results = []
    
    for table in tables['TABLE_NAME']:
        try:
            query = f"SELECT '{table}' AS TABLE_NAME, COUNT(*) AS COUNT FROM {table} WHERE LENGTH(TRIM(EMPLOYEE_ID)) <> 6"
            result = pd.read_sql(query, conn)
            if result['COUNT'].iloc[0] > 0:
                results.append(result)
        except Exception as e:
            print(f"Error while processing table {table}: {e}")
    
    # Combine the results
    all_results = pd.concat(results, ignore_index=True)
    
    
    Login or Signup to reply.
  2. So, what you want is a query that basically end up to something like this:

    SELECT * FROM
    (SELECT EMPLOYEE_ID, LENGTH(EMPLOYEE_ID) AS lenEmpId FROM ztable1 UNION ALL
    SELECT EMPLOYEE_ID, LENGTH(EMPLOYEE_ID) FROM ztable2 UNION ALL
    SELECT EMPLOYEE_ID, LENGTH(EMPLOYEE_ID) FROM ztable3) a
      WHERE lenEmpId <> 6;
    

    or maybe this (probably a quicker option than the query above):

    SELECT * FROM
    (SELECT EMPLOYEE_ID, LENGTH(EMPLOYEE_ID) FROM ztable1 
        WHERE LENGTH(EMPLOYEE_ID) <> 6 UNION ALL
    SELECT EMPLOYEE_ID, LENGTH(EMPLOYEE_ID) FROM ztable2 
        WHERE LENGTH(EMPLOYEE_ID) <> 6 UNION ALL
    SELECT EMPLOYEE_ID, LENGTH(EMPLOYEE_ID) FROM ztable3
        WHERE LENGTH(EMPLOYEE_ID) <> 6) a;
    

    .. without the hassle of writing the same query for each table. Your idea of using information_schema to build the query is in the right path but the way you’re doing it is incorrect. You’ll need to create the query, assign it into a variable then use PREPARE STATEMENT to execute it.

    /*build the query*/
    SELECT CONCAT('SELECT * FROM (',
              GROUP_CONCAT(
                 CONCAT('SELECT EMPLOYEE_ID, LENGTH(EMPLOYEE_ID) AS lenEmpId FROM ',TABLE_NAME
                             ,' WHERE LENGTH(EMPLOYEE_ID) <> 6') SEPARATOR ' UNION ALL '),
                ') a;') INTO @myquery
       FROM INFORMATION_SCHEMA.TABLES
    WHERE /*TABLE_SCHEMA = 'mySchema' AND */ TABLE_NAME like 'z%';
    
    /*check the query*/
    SELECT @myquery;
    
    /*use prepare statement to execute then deallocate the query*/
    PREPARE stmt FROM @myquery;
    EXECUTE stmt;
    DEALLOCATE PREPARE stmt;
    

    Here’s a demo fiddle

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search