mySQL how to identify all tables with the same column but content length not 6

JohnSmith
February 14, 2024
157 views
0 votes
2 Answers

We have Employee ID in over 50 different tables with various data types (varchar, int, text).
When these tables were populated via conversion, they should have been populated with 6 numbers. Unfortunately, some of the IDs had dropped leading zeros or the ID was appended with trailing spaces.
I should mention that all of the tables in question start with z%.
How can I find the TABLES that identify the Employee Ids with a length greater than or less than 6?

This is what my mind is thinking, but it’s obviously not right.

Select LENGTH(EMPLOYEE_ID), TABLE_NAME from (Select TABLE_NAME FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_SCHEMA = 'mySchema' and TABLE_NAME like 'z%') a
where LENGTH(EMPLOYEE_ID) <> 6
GROUP BY  TABLE_NAME, LENGTH(EMPLOYEE_ID)

What I don’t want to do is list out every table like:

Select LENGTH(EMPLOYEE_ID), TABLE_NAME from TABLE1
where LENGTH(EMPLOYEE_ID) <> 6
UNION
Select LENGTH(EMPLOYEE_ID), TABLE_NAME from TABLE2
where LENGTH(EMPLOYEE_ID) <> 6
UNION
Select LENGTH(EMPLOYEE_ID), TABLE_NAME from TABLE3
where LENGTH(EMPLOYEE_ID) <> 6

Tags: mysql sql

Answers

This might be tricky to do using purely SQL. I would advise using a scripting language like Python and taking advantage of pandas. You can do something like below:

import pymysql
import pandas as pd

conn = pymysql.connect(host='host', user='user', password='pass', db='mySchema')

# Fetch the list of tables
query = "SELECT TABLE_NAME FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_SCHEMA = 'mySchema' AND TABLE_NAME LIKE 'z%'"
tables = pd.read_sql(query, conn)

results = []

for table in tables['TABLE_NAME']:
    try:
        query = f"SELECT '{table}' AS TABLE_NAME, COUNT(*) AS COUNT FROM {table} WHERE LENGTH(TRIM(EMPLOYEE_ID)) <> 6"
        result = pd.read_sql(query, conn)
        if result['COUNT'].iloc[0] > 0:
            results.append(result)
    except Exception as e:
        print(f"Error while processing table {table}: {e}")

# Combine the results
all_results = pd.concat(results, ignore_index=True)

So, what you want is a query that basically end up to something like this:

SELECT * FROM
(SELECT EMPLOYEE_ID, LENGTH(EMPLOYEE_ID) AS lenEmpId FROM ztable1 UNION ALL
SELECT EMPLOYEE_ID, LENGTH(EMPLOYEE_ID) FROM ztable2 UNION ALL
SELECT EMPLOYEE_ID, LENGTH(EMPLOYEE_ID) FROM ztable3) a
  WHERE lenEmpId <> 6;

or maybe this (probably a quicker option than the query above):

SELECT * FROM
(SELECT EMPLOYEE_ID, LENGTH(EMPLOYEE_ID) FROM ztable1 
    WHERE LENGTH(EMPLOYEE_ID) <> 6 UNION ALL
SELECT EMPLOYEE_ID, LENGTH(EMPLOYEE_ID) FROM ztable2 
    WHERE LENGTH(EMPLOYEE_ID) <> 6 UNION ALL
SELECT EMPLOYEE_ID, LENGTH(EMPLOYEE_ID) FROM ztable3
    WHERE LENGTH(EMPLOYEE_ID) <> 6) a;

.. without the hassle of writing the same query for each table. Your idea of using information_schema to build the query is in the right path but the way you’re doing it is incorrect. You’ll need to create the query, assign it into a variable then use PREPARE STATEMENT to execute it.

/*build the query*/
SELECT CONCAT('SELECT * FROM (',
          GROUP_CONCAT(
             CONCAT('SELECT EMPLOYEE_ID, LENGTH(EMPLOYEE_ID) AS lenEmpId FROM ',TABLE_NAME
                         ,' WHERE LENGTH(EMPLOYEE_ID) <> 6') SEPARATOR ' UNION ALL '),
            ') a;') INTO @myquery
   FROM INFORMATION_SCHEMA.TABLES
WHERE /*TABLE_SCHEMA = 'mySchema' AND */ TABLE_NAME like 'z%';

/*check the query*/
SELECT @myquery;

/*use prepare statement to execute then deallocate the query*/
PREPARE stmt FROM @myquery;
EXECUTE stmt;
DEALLOCATE PREPARE stmt;

Here’s a demo fiddle

Please signup or login to give your own answer.

Click here to cancel reply.