I am developing a chatting application. I have written a SQL to get the active chats associated with the user. It used to get user to user chat and group chats based on sort it in descending order based on when the last message was sent. I’m trying to optimize this query. I’m using MySQL.
group_attributes
table schema :
+---------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+---------------+--------------+------+-----+---------+----------------+
| Chat_Group_id | int | NO | PRI | NULL | auto_increment |
| IsGroup | tinyint(1) | NO | | NULL | |
| name | varchar(20) | YES | | NULL | |
| description | varchar(500) | YES | | NULL | |
| createdAt | datetime | NO | | NULL | |
| updatedAt | datetime | NO | | NULL | |
+---------------+--------------+------+-----+---------+----------------+
UpdatedAt column is updated with current timestamp when a message record is added in chat_group_messages
table.
chat_groups
table schema:
+---------------+------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+---------------+------------+------+-----+---------+----------------+
| id | int | NO | PRI | NULL | auto_increment |
| Chat_Group_id | int | NO | | NULL | |
| user_id | int | NO | | NULL | |
| createdAt | datetime | NO | | NULL | |
| updatedAt | datetime | NO | | NULL | |
| admin | tinyint(1) | NO | | 0 | |
+---------------+------------+------+-----+---------+----------------+
chat_group_messages
table:
+---------------+---------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+---------------+---------------+------+-----+---------+----------------+
| id | int | NO | PRI | NULL | auto_increment |
| Chat_Group_id | int | NO | | NULL | |
| user_id | int | NO | | NULL | |
| message | varchar(5000) | NO | | NULL | |
| createdAt | datetime | NO | | NULL | |
| updatedAt | datetime | NO | | NULL | |
+---------------+---------------+------+-----+---------+----------------+
I have yet to add forign keys in above schemas as I’ve yet tofully understand how associations work in sequalize module.
**query 1 **:
This is used to find user to user chats associated with logged in user.
It selects user groups from group_attributes
table and joins them with chat_groups
table on column chat_group_id
and users
table on column user_id
and then, filters them with :
- checks if user exists in the group and where at least 1 message of that group exists in the table
chat_group_messages
- isGroup flag is false
user_id
is not equal to the requesting user (logged in user)- whether logged in user is in a group with another user
point 1 is needed because of the way my frontend works it creates arbitrary groups between users upon execution of certain actions ( I will change the this behavior in future to remove this filter )
point 2 gives checks if it is a Group
point 3 and 4 are needed to filter out records of logged in user while checking if they exist in a group
query 2:
This is used to find group chats associated with logged in user.
This selects from group_attributes
table and performs inner join on chat_groups
table on column chat_group_id
and weather user_id
is equal to logged user id. This query filters it out on basis of weather IsGroup
flag is true i.e. the association is a group containing 2 or more users.
At the end the union of 2 queries is taken to it is ordered by descending on column UpdatedAt
to get columns which were last updated i.e. groups where messages were last sent.
select * from (
select
ca.chat_group_id , ca.isgroup ,
u.user_id , u.username ,
name group_name , description as group_description ,
ca.updatedat
from group_attributes ca
inner join chat_groups cg on ca.chat_group_id = cg.chat_group_id
inner join users u on cg.user_id = u.user_id
where
exists ( select user_id from chat_group_messages cgm where cgm.chat_group_id = ca.chat_group_id )
and isgroup = false and u.user_id != ${logged_user.id}
and ${logged_user.id} in (
select cg2.user_id from chat_groups cg2
where cg.Chat_Group_id = cg2.Chat_Group_id
)
union
select
ca.chat_group_id , ca.isgroup ,
null as user_id , null as username ,
ca.name as group_name , ca.description as group_description ,
ca.updatedat
from group_attributes ca
inner join chat_groups cg on cg.chat_group_id = ca.chat_group_id and cg.user_id = ${logged_user.id}
where isgroup = true
) as temp
order by updatedat desc ;
group_attributes table is supposed to be small whereas other 2 table are supposed to be comparatively huge.
this query runs quickly in my development database. But I think in production it will take a long while and will probably lock the tables for a while, slowing down the whole application.
I don’t know how to optimize this query without breaking this logic.
Explain query shows this:
+----+-------------------+------------+------------+--------+---------------+---------+---------+--------------------------+------+----------+--------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------------+------------+------------+--------+---------------+---------+---------+--------------------------+------+----------+--------------------------------------------+
| 1 | PRIMARY | <derived2> | NULL | ALL | NULL | NULL | NULL | NULL | 8 | 100 | Using filesort |
| 2 | DERIVED | cg2 | NULL | ALL | NULL | NULL | NULL | NULL | 66 | 10 | Using where; Start temporary |
| 2 | DERIVED | ca | NULL | eq_ref | PRIMARY | PRIMARY | 4 | webapp.cg2.Chat_Group_id | 1 | 10 | Using where |
| 2 | DERIVED | cgm | NULL | ALL | NULL | NULL | NULL | NULL | 16 | 10 | Using where; Using join buffer (hash join) |
| 2 | DERIVED | cg | NULL | ALL | NULL | NULL | NULL | NULL | 66 | 10 | Using where; Using join buffer (hash join) |
| 2 | DERIVED | u | NULL | eq_ref | PRIMARY | PRIMARY | 4 | webapp.cg.user_id | 1 | 100 | End temporary |
| 5 | UNCACHEABLE UNION | cg | NULL | ALL | NULL | NULL | NULL | NULL | 66 | 10 | Using where |
| 5 | UNCACHEABLE UNION | ca | NULL | eq_ref | PRIMARY | PRIMARY | 4 | webapp.cg.Chat_Group_id | 1 | 10 | Using where |
| 6 | UNION RESULT | <union2,5> | NULL | ALL | NULL | NULL | NULL | NULL | NULL | NULL | Using temporary |
+----+-------------------+------------+------------+--------+---------------+---------+---------+--------------------------+------+----------+--------------------------------------------+
2
Answers
Your tables would make a bit more sense if
group_attributes
is renamed tochat_groups
and currentchat_groups
renamed tochat_group_users
.Drop the unnecessary AI PK on
chat_groups
in favour of PK (Chat_Group_id, user_id) and secondary index on (user_id, Chat_Group_id).Also, avoid the confusing mix of capitalisation and underscores. Choose a convention and stick to it. All lowercase with underscores is most portable but whatever you choose, be consistent.
As well as changing the structure of chat_groups, you need an index on chat_group_messages (Chat_Group_id).
This should give the same result:
Please qualify each column name; it is hard to track what comes from what table.
Based on user1191247’s reformulation, I recommend adding these composite indexes: