skip to Main Content

I have two tables (They are bigger with more of 200,000 rows, this is just an example)

Table1:temptable
id         |Chromosome | Start | End | Reference | Alternative | QUAL | MQ
........................................................................
NULL         chr1       12334  12335    A          TT            10   20
NULL         chr1       12334  12335    C          TT            5    3
NULL         chr2       123    123      A           T            1    2
NULL         chr2       34     34       A           T            11   60
NULL         chr3       12     12       A           T            6    NULL
Table2:TableVariants2
id          | Chromosome | Start | End | Reference | Alternative 
.............................................................
1            chr1          12334  12335   A        TT
2            chr1          12334  12335   C        TT
3            chr2          123    123     A         T
4            chr2          34     34      A         T
5            chr3          12     12      A         T

I want to merge them based on equal Chromosome, Start, End, Reference and Alternative and obtain:

id         |Chromosome | Start | End | Reference | Alternative | QUAL | MQ
........................................................................
1           chr1       12334  12335     A             TT         10     20
2           chr1       12334  12335     C             TT         5      3
3           chr2       123    123       A              T         1      2
4           chr2       34     34        A              T         11     60
5           chr3       12     12        A              T         6     NULL

I have tried the following but it is too slow for big tables and I need something faster

SELECT B.Maxid, A.Chromosome, A.Start, A.End, A.Reference, A.Alternative , A.QUAL,A.MQ 
FROM temptable A
INNER JOIN ( SELECT `Chromosome`, `Start`, `End`, `Reference`,`Alternative`, MAX(id) AS Maxid FROM TableVariants2 GROUP BY `Chromosome`, `Start`, `End`, `Reference`,`Alternative`) B
ON A.Chromosome=B.Chromosome AND A.Start=B.Start AND A.End=B.End AND A.Reference=B.Reference AND A.Alternative=B.Alternative

There is any alternative to the INNER JOIN in order to speed up the query?

2

Answers


  1. You can follow a few things while writing a query:

    1. Analyze your data.
    2. Make sure you have created necessary indexes.
    3. In Select statement write only required column, avoid unnecessary columns.
    4. Do not write subquery or co-related query until and unless is required.
    

    Let’s see if you follow these steps could help.
    But Inner Join is the basic join which cannot be replaced if you have to join two tables.

    Login or Signup to reply.
  2. INDEX(Chromosome, Start, End, Reference, Alternative, id) — in the same order as the GROUP BY.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search