skip to Main Content

My current project is a social media app somewhat like Facebook. Now the post created by both users and news post (there is a cron running every 15 min and it fetch latest news from various news channels) are keeping in the same table called post table. Because of news post the table is growing very fast and the timeline taking more time to load. So we are planing to slit normal post (post table) and news post (news_post table) to separate tables and then slit old news post to a backup a table (news_post_backup table).

Then on listing post API we have to take union of all these 3 tables and have to sort by post create time and have to take post based on pagination data and other conditions

I want to know is there any benefit from doing like this. I am doubtful because I have to take union then its again become same table like the previous table structure

MYSQL Version on server is 5.6

UPDATE
Here I am adding more information
The Query I am running is

select CP.id,CP.user_id,post_title,post_content,post_type,new_title,is_spam,spam_reportedby,CP.privacy,CP.link_title,CP.link_content,CP.link_image,CP.is_paid,CP.payment_status,CP.is_breaking,
CUP.id as channel_userspost_id,CUP.parent_id,
SU.full_name as reporteduser_full_name,SU.user_name as reporteduser_user_name,
SU.user_profile_pic as reporteduser_user_profile_pic,
FU.id as from_user_id, FU.full_name as from_user_full_name,
FU.user_name as from_user_name,
FU.user_profile_pic as from_user_profile_pic,
TU.id as to_user_id, TU.full_name as to_user_full_name,
TU.user_name as to_user_name,
TU.user_profile_pic as to_user_profile_pic,
TUA.authentication_status as to_user_authentication_status,
FUA.authentication_status as from_user_authentication_status,
C.verification_status as channel_verification_status,
CUP.created_at,CUP.updated_at,
guid,external_url,
CP.channel_id,CP.rss_channel_id,if(CP.rss_channel_id!=0,RC.rss_name,C.channel_name) as channel_name,
if(CP.rss_channel_id!=0,RC.rss_logo,C.profile_pic) as channel_logo,
C.channel_type,
PCD.like_count as like_count,
PCD.search_count as search_count,
PCD.view_count as view_count,
CM.channel_member_status,C.payment_status as channel_payment_status,C.payment_method as channel_payment_method,
CP.is_live_finished from `channel_users_posts` as `CUP` inner join `channel_posts` as `CP` on `CUP`.`channel_post_id` = `CP`.`id` and `is_spam` = 'N' 
left join `channels` as `C` on `CP`.`channel_id` = `C`.`id` 
left join `rss_channels` as `RC` on `CP`.`rss_channel_id` = `RC`.`id` left join `channel_members` as `CM` on `CM`.`channel_id` = `C`.`id` and `CM`.`user_id` = 427 and `CM`.`channel_member_status` != -1 
left join `test_develop_new`.`users` as `FU` on `FU`.`id` = `CUP`.`shared_from` left join `test_develop_new`.`users` as `SU` on `SU`.`id` = `CP`.`spam_reportedby` 
left join `test_develop_new`.`users` as `TU` on `TU`.`id` = `CUP`.`user_id` left join `common_auth_develop_new`.`user_authentication` as `FUA` on `FUA`.`user_id` = `FU`.`id` 
left join `common_auth_develop_new`.`user_authentication` as `TUA` on `TUA`.`user_id` = `TU`.`id` left join `post_count_details` as `PCD` on `PCD`.`channel_userspost_id` = `CUP`.`id`
where (`CP`.`is_paid` = 'N' or (`CP`.`is_paid` = 'Y' and `CP`.`payment_status` = 'S')) and (`CP`.`channel_id` in (705, 537) or (`CUP`.`user_id` in (8, 12, 427))) and `CUP`.`updated_at` < '2019-04-12 11:09:59.000000' and ((`CP`.`channel_id` != 0 and `CM`.`channel_member_status` is not null) or `CP`.`channel_id` = 0) and ((`CP`.`post_type` != 'BV' or `CP`.`user_id` = 427) or (CP.post_type ='BV' AND EXISTS(SELECT id FROM broadcast_visibility_ids WHERE post_id=CP.id AND post_visibility='PA'))) or (CP.post_type ='BV' AND EXISTS(SELECT id FROM broadcast_visibility_ids WHERE post_id=CP.id AND post_visibility IN ('CNL_A','CRY_A')) AND EXISTS(
SELECT DISTINCT channel_members.channel_id 
FROM channel_members
INNER JOIN channels ON channels.id=channel_members.channel_id
WHERE channel_members.channel_id IN (
705,537
) AND channel_members.channel_id IN (
select channel_id from channel_members where user_id = CP.user_id AND channel_member_status = 1 AND channel_member_role = '1'
) AND channels.channel_type != 46
)) or (CP.post_type ='BV' AND EXISTS(SELECT id FROM broadcast_visibility_ids WHERE post_id=CP.id AND post_visibility IN ('CNL_A','CRY_A')) AND EXISTS(
SELECT DISTINCT channel_members.channel_id 
FROM channel_members
INNER JOIN channels ON channels.id=channel_members.channel_id
WHERE channel_members.channel_id IN (
705,537
) AND channel_members.channel_id IN (
select channel_id from channel_members where user_id = CP.user_id AND channel_member_status = 1 AND channel_member_role = '1'
) AND channels.channel_type = 46
)) or (CP.post_type ='BV' AND EXISTS(SELECT id FROM broadcast_visibility_ids WHERE post_id=CP.id AND post_visibility IN ('CNL_S','CRY_S')) AND EXISTS(
SELECT DISTINCT channel_members.channel_id 
FROM channel_members
INNER JOIN channels ON channels.id=channel_members.channel_id
WHERE channel_members.channel_id IN (
705,537
) AND channel_members.channel_id IN (
select channel_id from channel_members where user_id = CP.user_id AND channel_member_status = 1
) AND channel_members.channel_id IN (SELECT visibility_ids FROM broadcast_visibility_ids WHERE post_id=CP.id AND post_visibility IN ('CNL_S','CRY_S'))
)) order by `CUP`.`updated_at` desc limit 30

core post table’s name is is channel_posts Here is the schema structure for the table

CREATE TABLE `channel_posts` (
  `id` bigint(20) UNSIGNED NOT NULL,
  `user_id` bigint(20) NOT NULL,
  `channel_id` bigint(20) NOT NULL,
  `rss_channel_id` int(11) NOT NULL,
  `post_title` text COLLATE utf8mb4_unicode_ci NOT NULL,
  `post_content` longtext COLLATE utf8mb4_unicode_ci NOT NULL,
  `post_type` enum('T','L','I','V','Y','G','A','MI','MV','MY','MG','MA','NS_T','NS_I','C_T','BV') COLLATE utf8mb4_unicode_ci DEFAULT 'T',
  `is_spam` enum('N','Y') COLLATE utf8mb4_unicode_ci NOT NULL DEFAULT 'N',
  `spam_reportedby` bigint(20) NOT NULL,
  `privacy` int(11) NOT NULL DEFAULT '2',
  `guid` longtext COLLATE utf8mb4_unicode_ci NOT NULL,
  `external_url` longtext COLLATE utf8mb4_unicode_ci NOT NULL,
  `link_title` text COLLATE utf8mb4_unicode_ci NOT NULL,
  `link_content` longtext COLLATE utf8mb4_unicode_ci NOT NULL,
  `is_breaking` enum('N','Y') COLLATE utf8mb4_unicode_ci NOT NULL DEFAULT 'N',
  `is_paid` enum('N','Y') COLLATE utf8mb4_unicode_ci NOT NULL DEFAULT 'N',
  `payment_status` enum('F','S') COLLATE utf8mb4_unicode_ci NOT NULL DEFAULT 'F',
  `link_image` varchar(255) COLLATE utf8mb4_unicode_ci NOT NULL,
  `is_live_finished` tinyint(1) NOT NULL DEFAULT '0',
  `created_at` timestamp(6) NOT NULL DEFAULT '0000-00-00 00:00:00.000000',
  `updated_at` timestamp(6) NOT NULL DEFAULT '0000-00-00 00:00:00.000000'
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;

and there is one more table channel_users_post

CREATE TABLE `channel_users_posts` (
  `id` bigint(20) UNSIGNED NOT NULL,
  `channel_post_id` bigint(20) NOT NULL,
  `parent_id` int(11) NOT NULL DEFAULT '0',
  `user_id` bigint(20) NOT NULL,
  `shared_from` bigint(20) NOT NULL,
  `new_title` varchar(255) COLLATE utf8mb4_unicode_ci NOT NULL,
  `created_at` timestamp(6) NOT NULL DEFAULT '0000-00-00 00:00:00.000000',
  `updated_at` timestamp(6) NOT NULL DEFAULT '0000-00-00 00:00:00.000000'
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;

There are 200,000 record in channel_post table and 600,000 records in channel_users_post table it takes 48586 ms to load.

2

Answers


  1. Have you considered paging your queries rather than splitting the table? Assuming the table is sorted by time, and there’s a clustered index on it, you could do something like

    SELECT id, time, content
    FROM post
    LIMIT 50 OFFSET 5000
    

    to get the 5000th newest post to the 5050th newest post.

    In terms of insertion time, you’d probably have a B tree index on the time, so it would be logarithmic.

    Additionally, it seems like “content” might be fairly big relative to the rest of the data so you could either make sure the index on time is alt 2, or split that off into its own table and run a separate query when you actually want the content.


    EDIT

    That’s a very big query and I can tell you almost immediately that the reason why it’s so slow has less to do with the size of the table and more to do with the amount of data you’re processing (10 JOINs with 11 nested SELECTs which have their own JOINs).

    Do you have to return all of this at once? Or can you get the very basic information you need and then make some calculations in your application, and then make another query? This way, the disk and memory don’t have to do as much work, and you’re moving that onto the CPU.

    If this query is necessary, please see this SO post for how to optimize queries with 10+ JOINs. However, note that in the end, the OP ended up splitting the query since it still takes too long.

    The takeaway here is to write smaller queries which usually don’t waste as much time/resources.

    Login or Signup to reply.
  2. Another option is to partition the post table by the type of post and date. It is still one table and no code change in the client side. Mysql can do partition elimination for queries.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search