skip to Main Content

data

There’re multiple tables which format is dm_ym_file_2022XXXX(date). The start date is 20220720 and the end date is 20221220

dm_ym_file_20220720
dm_ym_file_20220721
……
dm_ym_file_20221220

dm_ym_file_rules stores all names.The dm_ym_file_rules as belows:

    ID  start_date  end_date    table_names
    36  2022-07-20  2022-07-20  dm_ym_file_share_20220720
    37  2022-07-21  2022-07-21  dm_ym_file_share_20220721
    38  2022-07-22  2022-07-22  dm_ym_file_share_20220722

goal

I want to groupby some fields from all of these tables into one table.

insert into target_table
select a,b,c,sum(d)
from
(
select a,b,c,sum(d)
dm_ym_file_20220720
group by 1,2,3
union all
select a,b,c,sum(d)
dm_ym_file_20220721
group by 1,2,3
union all
……
select a,b,c,sum(d)
dm_ym_file_20221220
group by 1,2,3
) a
group by 1,2,3;

My code should list all tables. It’s inconvenient.

3

Answers


  1. The pattern you should use here is to union the source tables first, and then aggregate once on the outside.

    INSERT INTO target_table (c1, c2, c3, c4)  -- best to specify target columns here
    SELECT a, b, c, SUM(d)
    FROM
    (
        SELECT a, b, c, d FROM dm_ym_file_20220720
        UNION ALL
        SELECT a, b, c, d FROM dm_ym_file_20220721
        UNION ALL
        SELECT a, b, c, d FROM dm_ym_file_20221220
    ) t
    GROUP BY a, b, c;
    

    Note that a better long term solution here might be to revisit your database design. Do you really need to have separate tables with an almost identical structure? It might make more sense to just have a single table, with an additional column(s) corresponding to the year/month date.

    Login or Signup to reply.
  2. There are (at least) two problems in you SQL statement, besides the already mentioned fact that you should not store data like this.

    1. In you statement the required keyword FROM is missing (3 times)

    2. In the subquery the column sum(d) should get an alias d. The subquery will then have a 4th column, which can be referred to as d, which makes it possible for the outer query to refer to this field as d.

    Applied to your script, the script looks like:

    insert into target_table
    select a,b,c,sum(d)
    from
    (
       select a,b,c,sum(d) as d
       FROM dm_ym_file_20220720
       group by 1,2,3
       union all
       select a,b,c,sum(d)
       FROM dm_ym_file_20220721
       group by 1,2,3
       union all
       select a,b,c,sum(d)
       FROM dm_ym_file_20221220
       group by 1,2,3
    ) a
    group by 1,2,3;
    

    see: DBFIDDLE

    Login or Signup to reply.
  3. This procedure could gets things done:

    DELIMITER $$
    
    CREATE PROCEDURE proc_run()
    BEGIN
    CREATE TEMPORARY TABLE IF NOT EXISTS temp (a INT,  b INT, c INT, d INT);
    TRUNCATE TABLE temp;
    
    set @b = (Select max(id) from dm_ym_file_rules);
    set @a = 1;
    while @a <= @b DO 
      set @table_name = (select table_names from dm_ym_file_rules where id = @a);
      Set @Expression = concat('INSERT INTO temp SELECT a, b, c, d FROM ', @table_name);
      PREPARE myquery FROM @Expression;
      EXECUTE myquery;
      set @a = @a + 1;
    end while;
    
    select a, b, c, SUM(d) from temp;
    END$$
    $$
    
    call proc_run();
    

    But keep in mind that this isn’t the best way to organize database. A single table with index on the datetime column would be better.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search