skip to Main Content

There are two tables in my project: posts and post_categories.
Each post in the posts table has a post_category_id relating to post_categories.id as fk, and each entry in post_categories has a parent_id – self-referencing post_categories.id.

On the website, the post categories page shows all the posts in each category, but I need it to also count all the posts in each subcategory thereof.

post_categories (0 = main menu)

+-----+---------------+-----------+
| id  | title         | parent_id |
+-----+---------------+-----------+
| 1   | Machines      | 0         |
| 2   | Vehicles      | 1         |
| 3   | Industrial    | 1         |
| 4   | Road Cars     | 2         |
| 5   | Track Cars    | 2         |
+-----+---------------+-----------+

posts

+-----+--------+------------------+
| id  | title  | post_category_id |
+-----+--------+------------------+
| 1   | Rally  | 5                |
| 2   | F1     | 5                |
| 3   | Vans   | 4                |
| 4   | Trucks | 4                |
+-----+--------+------------------+

So far, if there is a category that does not have any posts directly under it, it shows 0 posts, even if the category contains 2 more categories with a total of 50 posts.

I have been spending a lot of time going round in circles with MySQL RECURSIVE functions, to no avail. It seems elusively simple, but I can’t find an example of a RECURSIVE query that takes a second table into consideration and it’s impossible to code for a number of layers, as there is no hard limit to how many levels there could be.

To clarify, then, let’s assume we are at the Vehicles menu (post_category_id: 2).
My query should:

  1. Count all posts having post_category_id = 2 (none in this case).
  2. Count all posts having post_category_id IN (4,5) – because post_categories 4 and 5 both have the parent_id = 2.
  3. Add up those numbers and give my two columns containing the total number of posts and the number of categories, so I can print:
    Post Category Title (Contains 30 posts in 6 categories)

Could anyone provide a recursive solution?

2

Answers


  1. I have a couple questions above that will help refine this answer if it is not what you are looking for.

    def post_and_cat_count(title)
      
      lookup_category = PostCategory.find_by(title: title).id
      child_categories = PostCategory.where(parent_id: lookup_category)
    
      post_count = Post.where(post_category_id: lookup_category).count + 
      Post.where(post_category_id: child_categories).count
    
      {posts: post_count, categories: (child_categories.length + 1)}
    
    end
    
    display_counts = post_and_cat_count('Vehicles')
    #=> {posts: 4, categories: 3}
    
    puts "Post Category Vehicles (Contains #{display_counts[:posts]} posts in #{display_counts[:categories]} categories)
    #=> 'Post Category Vehicles (Contains 4 posts in 3 categories)'
    

    I think this is what you are looking for? If you need something more recursive, then you need to expand your question and give a recursive example. By that I mean a situation where the method would call itself and break out of an infinite loop with a return clause.

    Login or Signup to reply.
  2. This is probably not the most efficient way but it works. The recursive CTE builds a list of all descendants of the given category (remove the WHERE clause from the first query in the CTE to build for all categories):

    WITH RECURSIVE hierarchy (id, descendant, top_level_id) AS (
        SELECT id, FALSE, id
        FROM post_categories
        WHERE id = 2
    
        UNION ALL
    
        SELECT pc.id, TRUE, h.top_level_id
        FROM hierarchy h
        JOIN post_categories pc ON h.id = pc.parent_id
    )
    SELECT
        pc.*,
        SUM(descendant) AS num_categories,
        SUM((SELECT COUNT(*) FROM posts WHERE post_category_id = h.id)) AS num_posts
    FROM post_categories pc
    JOIN hierarchy h ON h.top_level_id = pc.id
    WHERE pc.id = 2
    GROUP BY pc.id;
    

    For the sample data provided in your question, it outputs:

    id title parent_id num_categories num_posts
    2 Vehicles 1 2 4

    If running for all categories you should pre-aggregate the category counts as they will be required multiple times:

    WITH RECURSIVE hierarchy (id, descendant, top_level_id) AS (
        SELECT id, FALSE, id
        FROM post_categories
    
        UNION ALL
    
        SELECT pc.id, TRUE, h.top_level_id
        FROM hierarchy h
        JOIN post_categories pc ON h.id = pc.parent_id
    ),
    cat_counts (id, cnt) AS (
        SELECT pc.id, COUNT(p.id)
        FROM post_categories pc
        LEFT JOIN posts p ON pc.id = p.post_category_id
        GROUP BY pc.id
    )
    SELECT pc.*, SUM(descendant) AS categories, SUM(cc.cnt) AS num_posts
    FROM post_categories pc
    JOIN hierarchy h ON h.top_level_id = pc.id
    JOIN cat_counts cc ON h.id = cc.id
    GROUP BY pc.id;
    

    which outputs:

    id title parent_id categories num_posts
    1 Machines NULL 4 4
    2 Vehicles 1 2 4
    3 Industrial 1 0 0
    4 Road Cars 2 0 2
    5 Track Cars 2 0 2

    Here’s a db<>fiddle

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search