skip to Main Content

I am developing a chatting application. I have written a SQL to get the active chats associated with the user. It used to get user to user chat and group chats based on sort it in descending order based on when the last message was sent. I’m trying to optimize this query. I’m using MySQL.

group_attributes table schema :

+---------------+--------------+------+-----+---------+----------------+
| Field         | Type         | Null | Key | Default | Extra          |
+---------------+--------------+------+-----+---------+----------------+
| Chat_Group_id | int          | NO   | PRI | NULL    | auto_increment |
| IsGroup       | tinyint(1)   | NO   |     | NULL    |                |
| name          | varchar(20)  | YES  |     | NULL    |                |
| description   | varchar(500) | YES  |     | NULL    |                |
| createdAt     | datetime     | NO   |     | NULL    |                |
| updatedAt     | datetime     | NO   |     | NULL    |                |
+---------------+--------------+------+-----+---------+----------------+

UpdatedAt column is updated with current timestamp when a message record is added in chat_group_messages table.

chat_groups table schema:

+---------------+------------+------+-----+---------+----------------+
| Field         | Type       | Null | Key | Default | Extra          |
+---------------+------------+------+-----+---------+----------------+
| id            | int        | NO   | PRI | NULL    | auto_increment |
| Chat_Group_id | int        | NO   |     | NULL    |                |
| user_id       | int        | NO   |     | NULL    |                |
| createdAt     | datetime   | NO   |     | NULL    |                |
| updatedAt     | datetime   | NO   |     | NULL    |                |
| admin         | tinyint(1) | NO   |     | 0       |                |
+---------------+------------+------+-----+---------+----------------+

chat_group_messages table:

+---------------+---------------+------+-----+---------+----------------+
| Field         | Type          | Null | Key | Default | Extra          |
+---------------+---------------+------+-----+---------+----------------+
| id            | int           | NO   | PRI | NULL    | auto_increment |
| Chat_Group_id | int           | NO   |     | NULL    |                |
| user_id       | int           | NO   |     | NULL    |                |
| message       | varchar(5000) | NO   |     | NULL    |                |
| createdAt     | datetime      | NO   |     | NULL    |                |
| updatedAt     | datetime      | NO   |     | NULL    |                |
+---------------+---------------+------+-----+---------+----------------+

I have yet to add forign keys in above schemas as I’ve yet tofully understand how associations work in sequalize module.

**query 1 **:
This is used to find user to user chats associated with logged in user.
It selects user groups from group_attributes table and joins them with chat_groups table on column chat_group_id and users table on column user_id and then, filters them with :

  1. checks if user exists in the group and where at least 1 message of that group exists in the table chat_group_messages
  2. isGroup flag is false
  3. user_id is not equal to the requesting user (logged in user)
  4. whether logged in user is in a group with another user

point 1 is needed because of the way my frontend works it creates arbitrary groups between users upon execution of certain actions ( I will change the this behavior in future to remove this filter )

point 2 gives checks if it is a Group

point 3 and 4 are needed to filter out records of logged in user while checking if they exist in a group

query 2:
This is used to find group chats associated with logged in user.

This selects from group_attributes table and performs inner join on chat_groups table on column chat_group_id and weather user_id is equal to logged user id. This query filters it out on basis of weather IsGroup flag is true i.e. the association is a group containing 2 or more users.

At the end the union of 2 queries is taken to it is ordered by descending on column UpdatedAt to get columns which were last updated i.e. groups where messages were last sent.

select *  from (
      select 
      ca.chat_group_id , ca.isgroup , 
      u.user_id ,  u.username , 
      name group_name , description as  group_description , 
      ca.updatedat
      from group_attributes ca
      inner join chat_groups cg on ca.chat_group_id = cg.chat_group_id
      inner join users u on cg.user_id = u.user_id  
      where 
      exists ( select user_id from chat_group_messages cgm where  cgm.chat_group_id = ca.chat_group_id )
      and isgroup =  false and u.user_id != ${logged_user.id} 
      and ${logged_user.id} in (
        select  cg2.user_id  from chat_groups cg2 
        where cg.Chat_Group_id = cg2.Chat_Group_id
            )
    
    union 
    
    select 
       ca.chat_group_id , ca.isgroup , 
        null as user_id , null as username , 
        ca.name as group_name , ca.description as group_description , 
        ca.updatedat 
        from group_attributes ca
        inner join chat_groups cg on cg.chat_group_id = ca.chat_group_id and cg.user_id = ${logged_user.id}
        where isgroup = true 
        ) as temp 
        order by updatedat desc ;

group_attributes table is supposed to be small whereas other 2 table are supposed to be comparatively huge.

this query runs quickly in my development database. But I think in production it will take a long while and will probably lock the tables for a while, slowing down the whole application.

I don’t know how to optimize this query without breaking this logic.

Explain query shows this:

+----+-------------------+------------+------------+--------+---------------+---------+---------+--------------------------+------+----------+--------------------------------------------+
| id | select_type       | table      | partitions | type   | possible_keys | key     | key_len | ref                      | rows | filtered | Extra                                      |
+----+-------------------+------------+------------+--------+---------------+---------+---------+--------------------------+------+----------+--------------------------------------------+
|  1 | PRIMARY           | <derived2> | NULL       | ALL    | NULL          | NULL    | NULL    | NULL                     |    8 |      100 | Using filesort                             |
|  2 | DERIVED           | cg2        | NULL       | ALL    | NULL          | NULL    | NULL    | NULL                     |   66 |       10 | Using where; Start temporary               |
|  2 | DERIVED           | ca         | NULL       | eq_ref | PRIMARY       | PRIMARY | 4       | webapp.cg2.Chat_Group_id |    1 |       10 | Using where                                |
|  2 | DERIVED           | cgm        | NULL       | ALL    | NULL          | NULL    | NULL    | NULL                     |   16 |       10 | Using where; Using join buffer (hash join) |
|  2 | DERIVED           | cg         | NULL       | ALL    | NULL          | NULL    | NULL    | NULL                     |   66 |       10 | Using where; Using join buffer (hash join) |
|  2 | DERIVED           | u          | NULL       | eq_ref | PRIMARY       | PRIMARY | 4       | webapp.cg.user_id        |    1 |      100 | End temporary                              |
|  5 | UNCACHEABLE UNION | cg         | NULL       | ALL    | NULL          | NULL    | NULL    | NULL                     |   66 |       10 | Using where                                |
|  5 | UNCACHEABLE UNION | ca         | NULL       | eq_ref | PRIMARY       | PRIMARY | 4       | webapp.cg.Chat_Group_id  |    1 |       10 | Using where                                |
|  6 | UNION RESULT      | <union2,5> | NULL       | ALL    | NULL          | NULL    | NULL    | NULL                     | NULL |     NULL | Using temporary                            |
+----+-------------------+------------+------------+--------+---------------+---------+---------+--------------------------+------+----------+--------------------------------------------+

2

Answers


  1. Your tables would make a bit more sense if group_attributes is renamed to chat_groups and current chat_groups renamed to chat_group_users.

    Drop the unnecessary AI PK on chat_groups in favour of PK (Chat_Group_id, user_id) and secondary index on (user_id, Chat_Group_id).

    Also, avoid the confusing mix of capitalisation and underscores. Choose a convention and stick to it. All lowercase with underscores is most portable but whatever you choose, be consistent.

    As well as changing the structure of chat_groups, you need an index on chat_group_messages (Chat_Group_id).

    This should give the same result:

    SELECT
        ga.Chat_Group_id, ga.IsGroup,
        ga.name as group_name, ga.description as group_description, ga.updatedAt,
        u.user_id ,  u.username
    FROM group_attributes ga
    LEFT JOIN chat_groups cg
        ON ga.IsGroup = 0
        AND ga.Chat_Group_id = cg.Chat_Group_id
        AND cg.user_id <> ${logged_user.id}
    LEFT JOIN users u
        ON cg.user_id = u.user_id
    WHERE EXISTS (
        SELECT 1 FROM chat_groups
        WHERE Chat_Group_id = ga.Chat_Group_id AND user_id = ${logged_user.id}
    )
    AND EXISTS (
        SELECT 1 FROM chat_group_messages
        WHERE Chat_Group_id = ga.Chat_Group_id
    )
    ORDER BY ga.updatedAt DESC;
    
    Login or Signup to reply.
  2. Please qualify each column name; it is hard to track what comes from what table.

    Based on user1191247’s reformulation, I recommend adding these composite indexes:

    ga:  INDEX(IsGroup,  Chat_Group_id, description, updatedAt)
    ga:  INDEX(Chat_Group_id)
    u:   INDEX(user_id,  username)
    cg:  INDEX(Chat_Group_id,  user_id)
    chat_group_messages:  INDEX(Chat_Group_id)
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search