MySQL 5.7 full outer join

Ankit
March 21, 2023
337 views
2 votes
4 Answers

Table 1 and sample data:-

CREATE TABLE student_p (
    ID INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
    S_id INT UNSIGNED NOT NULL,
    Points DOUBLE NOT NULL,
    P_date DATE NOT NULL
);
INSERT INTO student_p VALUES
    (50055, 3330, 45, '2023-11-30'),
    (50056,  332, 43, '2013-10-31'),
    (50057, 3330, 22, '2013-10-30');

Table 2 and sample data: –

CREATE TABLE student_act (
    ID INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
    s_id INT UNSIGNED NOT NULL,
    VIDEO_SCORE DOUBLE NOT NULL,
    EXERCISESCORE DOUBLE NOT NULL,
    DS INT NOT NULL,
    A_date DATE NOT NULL
);
INSERT INTO student_act VALUES
    (2333, 233, 22.43,  233.4455, 23, '2023-11-30'),
    (2334, 235, 24.566,      232, 34, '2023-10-31'),
    ( 322, 678, 23,           45, 23, '2022-10-30'),
    ( 433,  45, 23,           23, 43, '2022-10-01');

Only ids in both table can’t be duplicate rest all data can be duplicate.

student_p table: –

ID	S_id	Points	P_date
50055	3330	45	2023-11-30
50056	332	43	2013-10-31
50057	3330	22	2013-10-30

student_act table: –

ID	s_id	VIDEO_SCORE	Lesson_score	DS	A_date
2333	233	22.43	233.4455	23	2023-11-30
2334	235	24.566	232	34	2023-10-31
322	678	23	45	23	2022-10-30
433	45	23	23	43	2022-10-01

desired result: –

year	month	points	VIDEO_SCORE	Lesson_score	DS
2023	11	45	22.43	233.4455	23
2023	10	null	24.566	232	34
2022	10	null	46	68	66
2013	10	65	null	null	null

Tags: mysql mysql-5.7

Answers

- HossamElNadi
- March 21, 2023 at 7:58 am
- 0 votes
0
For starters, you can try to index your DATE field on the student_act table.
Indexing helps the SQL engine to find records based on that field faster.

MySql:
```
ALTER TABLE `student_act` ADD INDEX `date_index` (`DATE`)
```
Indexing optimises the lookup process in SQL engines, usually very effective if the query matches records based on MULTIPLE fields, or like in your example, the matched field is non-numeric.

Also, try to rename your field to something different than DATE, as it is a reserved word in some SQL engines.
Login or Signup to reply.

In your second query you are restricting the result set by the year. If that is what you want to do, you should do it based on the date, not a function on the date, as that is non-sargable. This will then be able to use the index suggested by Hossam. Putting it (overly) simplistically, the HAVING clause is applied after doing all the work, whereas the WHERE clause reduces the amount of work being done:

SELECT
  YEAR(msa.DATE) AS Year,
  MONTH(msa.DATE) AS Month,
  SUM(msk.POINT) AS Ps,
  SUM(msa.VIDEO_SCORE) AS Video,
  SUM(msa.EXERCISESCORE) AS Lessons,
  SUM(msa.DS) AS DS
FROM student_p msk
RIGHT JOIN student_act msa ON msk.DATE = msa.DATE
WHERE msa.DATE >= MAKEDATE(2023, 1) AND msa.DATE < MAKEDATE(2024, 1)
GROUP BY `Year`, `Month`;

MAKEDATE(year,dayofyear)

Returns a date, given year and day-of-year values. dayofyear must be greater than 0 or the result is NULL.

In one of your comments you stated that you wanted a full join, which MySQL does not have but you can achieve the same with a (LEFT|RIGHT) JOIN UNIONed to another (LEFT|RIGHT) JOIN.

WITH st (`Year`, `Month`, `Ps`) AS (
    SELECT
        YEAR(`P_date`) AS `Year`,
        MONTH(`P_date`) AS `Month`,
        SUM(`Points`)
    FROM `student_p`
    -- WHERE `P_date` >= MAKEDATE(2023, 1) AND `P_date` < MAKEDATE(2024, 1)
    GROUP BY `Year`, `Month`
),
act (`Year`, `Month`, `Video`, `Lessons`, `DS`) AS (
    SELECT
        YEAR(`A_date`) AS `Year`,
        MONTH(`A_date`) AS `Month`,
        SUM(`VIDEO_SCORE`) AS `Video`,
        SUM(`EXERCISESCORE`) AS `Lessons`,
        SUM(`DS`) AS `DS`
    FROM student_act
    -- WHERE A_date >= MAKEDATE(2023, 1) AND DATE < MAKEDATE(2024, 1)
    GROUP BY `Year`, `Month`
)
SELECT
    `Year`,
    `Month`,
    SUM(`Ps`) AS `Ps`,
    SUM(`Video`) AS `Video`,
    SUM(`Lessons`) AS `Lessons`,
    SUM(`DS`) AS `DS`
FROM (
    SELECT `act`.`Year`, `act`.`Month`, `Ps`, `Video`, `Lessons`, `DS`
    FROM `st`
    RIGHT JOIN `act` ON `st`.`Year` = `act`.`Year` AND `st`.`Month` = `act`.`Month`

    UNION ALL

    SELECT `st`.`Year`, `st`.`Month`, `Ps`, `Video`, `Lessons`, `DS`
    FROM `st`
    LEFT JOIN `act` ON `st`.`Year` = `act`.`Year` AND `st`.`Month` = `act`.`Month`
    WHERE `act`.`Year` IS NULL
) t
GROUP BY `Year`, `Month`
ORDER BY `Year` DESC, `Month` DESC;

Output:

Year	Month	Ps	Video	Lessons	DS
2023	11	45	22.43	233.4455	23
2023	10	NULL	24.566	232	34
2022	10	NULL	46	68	66
2013	10	65	NULL	NULL	NULL

The two CTEs do the aggregation. I left the WHERE clauses in but commented out.

The first query in the UNION is your current RIGHT JOIN retrieving all records from the right hand table, whether or not they have an associated row in the left hand table. The second query then gets all the rows from the left hand table which do not have an associated row in the right hand side.

Here’s a db<>fiddle with invalid dates and duplicate PK values tweaked.

For MySQL 5.7 you will have to repeat the subqueries within your main query, as CTEs are not available to you:

SELECT
    `Year`,
    `Month`,
    SUM(`Ps`) AS `Ps`,
    SUM(`Video`) AS `Video`,
    SUM(`Lessons`) AS `Lessons`,
    SUM(`DS`) AS `DS`
FROM (
    SELECT `act`.`Year`, `act`.`Month`, `Ps`, `Video`, `Lessons`, `DS`
    FROM (
        SELECT
            YEAR(`P_date`) AS `Year`,
            MONTH(`P_date`) AS `Month`,
            SUM(`Points`) AS `Ps`
        FROM `student_p`
        -- WHERE `P_date` >= MAKEDATE(2023, 1) AND `P_date` < MAKEDATE(2024, 1)
        GROUP BY `Year`, `Month`
    ) `st`
    RIGHT JOIN (
        SELECT
            YEAR(`A_date`) AS `Year`,
            MONTH(`A_date`) AS `Month`,
            SUM(`VIDEO_SCORE`) AS `Video`,
            SUM(`EXERCISESCORE`) AS `Lessons`,
            SUM(`DS`) AS `DS`
        FROM student_act
        -- WHERE A_date >= MAKEDATE(2023, 1) AND DATE < MAKEDATE(2024, 1)
        GROUP BY `Year`, `Month`
    ) `act` ON `st`.`Year` = `act`.`Year` AND `st`.`Month` = `act`.`Month`

    UNION ALL

    SELECT `st`.`Year`, `st`.`Month`, `Ps`, `Video`, `Lessons`, `DS`
    FROM (
        SELECT
            YEAR(`P_date`) AS `Year`,
            MONTH(`P_date`) AS `Month`,
            SUM(`Points`) AS `Ps`
        FROM `student_p`
        -- WHERE `P_date` >= MAKEDATE(2023, 1) AND `P_date` < MAKEDATE(2024, 1)
        GROUP BY `Year`, `Month`
    ) `st`
    LEFT JOIN (
        SELECT
            YEAR(`A_date`) AS `Year`,
            MONTH(`A_date`) AS `Month`,
            SUM(`VIDEO_SCORE`) AS `Video`,
            SUM(`EXERCISESCORE`) AS `Lessons`,
            SUM(`DS`) AS `DS`
        FROM student_act
        -- WHERE A_date >= MAKEDATE(2023, 1) AND DATE < MAKEDATE(2024, 1)
        GROUP BY `Year`, `Month`
    ) `act` ON `st`.`Year` = `act`.`Year` AND `st`.`Month` = `act`.`Month`
    WHERE `act`.`Year` IS NULL
) t
GROUP BY `Year`, `Month`
ORDER BY `Year` DESC, `Month` DESC;

Output:

Year	Month	Ps	Video	Lessons	DS
2023	11	45	22.43	233.4455	23
2023	10	NULL	24.566	232	34
2022	10	NULL	46	68	66
2013	10	65	NULL	NULL	NULL

db<>fiddle

Is this what you are looking for?

The second query in the UNION could be replaced with:

    SELECT
        YEAR(`P_date`) AS `Year`,
        MONTH(`P_date`) AS `Month`,
        SUM(`Points`) AS `Ps`,
        NULL AS `Video`,
        NULL AS `Lessons`,
        NULL AS `DS`
    FROM `student_p`
    WHERE NOT EXISTS (
        SELECT 1
        FROM student_act
        WHERE A_date BETWEEN student_p.P_date - INTERVAL (DAY(student_p.P_date) - 1) DAY
                         AND LAST_DAY(student_p.P_date)
    )
    -- AND `P_date` >= MAKEDATE(2023, 1) AND `P_date` < MAKEDATE(2024, 1)
    GROUP BY `Year`, `Month`

You’ll have to experiment with these query variations and your real data to see what works best. You should throw O. Jones’ suggestion into the mix too, to see if it improves the performance of the aggregate queries.

db<>fiddle

If you have performance issues with this query, please update your question including the EXPLAIN output for the full query, and for the two aggregation subqueries.

You can also try this

SELECT
  YEAR(msa.DATE) AS Year,
  MONTH(msa.DATE) AS Month,
  SUM(msk.POINT) AS Ps,
  SUM(msa.VIDEO_SCORE) AS Video,
  SUM(msa.EXERCISESCORE) AS Lessons,
  SUM(msa.DS) AS DS
FROM student_p msk
RIGHT JOIN student_act msa ON msk.DATE = msa.DATE
WHERE msa.DATE between '2023-01-01 00:00:00' AND '2023-12-31 23:59:59'
GROUP BY MONTH(msa.DATE);

- OJones
- March 21, 2023 at 11:35 am
- 0 votes
0
Instead of using the YEAR() and MONTH() functions, try grouping by LAST_DAY(). It gives you the last day of the month containing any DATE, DATETIME, or TIMESTAMP.

Like this:
```
SELECT
  LAST_DAY(msa.DATE) AS MonthEnding,
  SUM(msk.POINT) AS Ps,
  SUM(msa.VIDEO_SCORE) AS Video,
  SUM(msa.EXERCISESCORE) AS Lessons,
  SUM(msa.DS) AS DS
FROM student_p msk
RIGHT JOIN student_act msa ON msk.DATE = msa.DATE
WHERE msa.DATE >= MAKEDATE(2023, 1) AND msa.DATE < MAKEDATE(2024, 1)
GROUP BY LAST_DAY(msa.DATE);
```
Create a covering index on student_p(date, POINT) and another on student_act(DATE, VIDEO_SCORE, EXERCISES_SCORE, DS).

That form of the query and those indexes will help a lot.
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.