skip to Main Content

Source: MS Access on Windows network share

Target: MySQL/MariaDB on Ubuntu

Tools: mdb-export, mysqlimport

record count: 1,5 Mio +

I wonder if there is a fast and reliable way of comparing the imported data records.

Is there an SQL standard equivalent to e.g. md5 fingerprint hashes of files? Right now, I am building different import routines and I only want to fast check for similarity and (if failed) search for the detailed differences later on.

3

Answers


  1. Chosen as BEST ANSWER

    I found it.

    It's very simple and very fast.

    CHECKSUM TABLE tbl_name
    

    Will give you a number value to compare.

    And it's Transact-SQL so will hopefully work the same on MS Access, MySQL and MariaDB


  2. A somewhat of a quick-and-dirty approach for individual columns can be implemented using stored aggregate functions which should be SQL standard.

    This is how you’d do it with MariaDB:

    CREATE AGGREGATE FUNCTION IF NOT EXISTS my_checksum(x TEXT) RETURNS CHAR(40)
    DETERMINISTIC
    BEGIN
     DECLARE cksum CHAR(40) DEFAULT SHA1('');
     DECLARE CONTINUE HANDLER FOR NOT FOUND
     RETURN cksum;
          LOOP
              FETCH GROUP NEXT ROW;
              SET cksum = SHA1(CONCAT(cksum, x));
          END LOOP;
    END
    

    You can then calculate a checksum from of a column as such:

    MariaDB [test]> create or replace table t1(data varchar(20));
    Query OK, 0 rows affected (0.063 sec)
    
    MariaDB [test]> create or replace table t2(data varchar(20));
    Query OK, 0 rows affected (0.064 sec)
    
    MariaDB [test]> insert into t1 values ('hello'), ('world'), ('!');
    Query OK, 3 rows affected (0.011 sec)
    Records: 3  Duplicates: 0  Warnings: 0
    
    MariaDB [test]> insert into t2 values ('Hello'), ('World'), ('!');
    Query OK, 3 rows affected (0.015 sec)
    Records: 3  Duplicates: 0  Warnings: 0
    
    MariaDB [test]> select my_checksum(data) from t1;
    +------------------------------------------+
    | my_checksum(data)                        |
    +------------------------------------------+
    | 7f6fb9a61c2097f70a36254c332c47364c496e07 |
    +------------------------------------------+
    1 row in set (0.001 sec)
    
    MariaDB [test]> select my_checksum(data) from t2;
    +------------------------------------------+
    | my_checksum(data)                        |
    +------------------------------------------+
    | 5f683ea3674e33ce24bff5f68f53509566ad4da2 |
    +------------------------------------------+
    1 row in set (0.001 sec)
    
    MariaDB [test]> delete from t2;
    Query OK, 3 rows affected (0.011 sec)
    
    MariaDB [test]> insert into t2 values ('hello'), ('world'), ('!');
    Query OK, 3 rows affected (0.012 sec)
    Records: 3  Duplicates: 0  Warnings: 0
    
    MariaDB [test]> select my_checksum(data) from t2;
    +------------------------------------------+
    | my_checksum(data)                        |
    +------------------------------------------+
    | 7f6fb9a61c2097f70a36254c332c47364c496e07 |
    +------------------------------------------+
    1 row in set (0.001 sec)
    

    This of course relies on the SHA1 of the column being the same on all databases. Conversions into strings should make it mostly compatible but there might be differences in how these are implemented in different databases.

    Login or Signup to reply.
  3. The percona toolkit has the tool you need.

    https://docs.percona.com/percona-toolkit/

    See pt-table-checksum and pt-table-sync

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search