skip to Main Content

I have a model Item with the corresponding database table items where is currently about 240,000 records.

I need to generate statistics from the data in the items table on my dashboard. Currently, I am trying to do it this way:

def dashboard
    if current_user.present?
      @data = Hash.new
      (1..20).each do |n|
        @data[n] = Hash.new
        @data[n]['a_sale'] = Hash.new
        @data[n]['a_lease'] = Hash.new
        @data[n]['b_sale'] = Hash.new
        @data[n]['b_lease'] = Hash.new
        @data[n]['c_sale'] = Hash.new
        @data[n]['c_lease'] = Hash.new
        @data[n]['d_sale'] = Hash.new
        @data[n]['d_lease'] = Hash.new

        tday = Date.today
        yday = tday-1.day

        @data[n]['a_sale']['today'] = Item.where('cat1 = 1 AND cat2 = 1 AND cat3 = ? AND created_at >= ? AND created_at <= ?', n, tday.beginning_of_day.in_time_zone("London"), tday.end_of_day.in_time_zone("London")).count()
        @data[n]['a_sale']['yday'] = Item.where('cat1 = 1 AND cat2 = 1 AND cat3 = ? AND created_at >= ? AND created_at <= ?', n, yday.beginning_of_day.in_time_zone("London"), yday.end_of_day.in_time_zone("London")).count()
        @data[n]['a_lease']['today'] = Item.where('cat1 = 1 AND cat2 = 2 AND cat3 = ? AND created_at >= ? AND created_at <= ?', n, tday.beginning_of_day.in_time_zone("London"), tday.end_of_day.in_time_zone("London")).count()
        @data[n]['a_lease']['yday'] = Item.where('cat1 = 1 AND cat2 = 2 AND cat3 = ? AND created_at >= ? AND created_at <= ?', n, yday.beginning_of_day.in_time_zone("London"), yday.end_of_day.in_time_zone("London")).count()

        @data[n]['b_sale']['today'] = Item.where('cat1 = 2 AND cat2 = 1 AND cat3 = ? AND created_at >= ? AND created_at <= ?', n, tday.beginning_of_day.in_time_zone("London"), tday.end_of_day.in_time_zone("London")).count()
        @data[n]['b_sale']['yday'] = Item.where('cat1 = 2 AND cat2 = 1 AND cat3 = ? AND created_at >= ? AND created_at <= ?', n, yday.beginning_of_day.in_time_zone("London"), yday.end_of_day.in_time_zone("London")).count()
        @data[n]['b_lease']['today'] = Item.where('cat1 = 2 AND cat2 = 2 AND cat3 = ? AND created_at >= ? AND created_at <= ?', n, tday.beginning_of_day.in_time_zone("London"), tday.end_of_day.in_time_zone("London")).count()
        @data[n]['b_lease']['yday'] = Item.where('cat1 = 2 AND cat2 = 2 AND cat3 = ? AND created_at >= ? AND created_at <= ?', n, yday.beginning_of_day.in_time_zone("London"), yday.end_of_day.in_time_zone("London")).count()

        @data[n]['c_sale']['today'] = Item.where('cat1 = 3 AND cat2 = 1 AND cat3 = ? AND created_at >= ? AND created_at <= ?', n, tday.beginning_of_day.in_time_zone("London"), tday.end_of_day.in_time_zone("London")).count()
        @data[n]['c_sale']['yday'] = Item.where('cat1 = 3 AND cat2 = 1 AND cat3 = ? AND created_at >= ? AND created_at <= ?', n, yday.beginning_of_day.in_time_zone("London"), yday.end_of_day.in_time_zone("London")).count()
        @data[n]['c_lease']['today'] = Item.where('cat1 = 3 AND cat2 = 2 AND cat3 = ? AND created_at >= ? AND created_at <= ?', n, yday.beginning_of_day.in_time_zone("London"), tday.end_of_day.in_time_zone("London")).count()
        @data[n]['c_lease']['yday'] = Item.where('cat1 = 3 AND cat2 = 2 AND cat3 = ? AND created_at >= ? AND created_at <= ?', n, yday.beginning_of_day.in_time_zone("London"), yday.end_of_day.in_time_zone("London")).count()

        @data[n]['d_sale']['today'] = Item.where('cat1 = 4 AND cat2 = 1 AND cat3 = ? AND created_at >= ? AND created_at <= ?', n, tday.beginning_of_day.in_time_zone("London"), tday.end_of_day.in_time_zone("London")).count()
        @data[n]['d_sale']['yday'] = Item.where('cat1 = 4 AND cat2 = 1 AND cat3 = ? AND created_at >= ? AND created_at <= ?', n, yday.beginning_of_day.in_time_zone("London"), yday.end_of_day.in_time_zone("London")).count()
        @data[n]['d_lease']['today'] = Item.where('cat1 = 4 AND cat2 = 2 AND cat3 = ? AND created_at >= ? AND created_at <= ?', n, tday.beginning_of_day.in_time_zone("London"), tday.end_of_day.in_time_zone("London")).count()
        @data[n]['d_lease']['yday'] = Item.where('cat1 = 4 AND cat2 = 2 AND cat3 = ? AND created_at >= ? AND created_at <= ?', n, yday.beginning_of_day.in_time_zone("London"), yday.end_of_day.in_time_zone("London")).count()

        @total_today = Item.where('created_at >= ? AND created_at <= ?', tday.beginning_of_day.in_time_zone("London"), tday.end_of_day.in_time_zone("London")).count()
        @total_yday = Item.where('created_at >= ? AND created_at <= ?', yday.beginning_of_day.in_time_zone("London"), yday.end_of_day.in_time_zone("London")).count()
      end
    end
  end

What happens when I try to open this page in the app is that the page either doesn’t load at all (timeout) or it loads, but it takes close to 30 seconds.

I tried to add indeces to the cat1, cat2, cat3 and created_at columns, but it didn’t help.

There’s a possibility that the number of rows in the table items will grow to ~1M in the next 3-4 months.

How do I generate stats about the data from the tables of similar sizes? Also, I realize the code above might not appear to look very intelligent – I tried to put something together quickly (I am happy to learn how to generate stats better, though).

2

Answers


  1. Untested but too long for a comment.

    The following should work for you (if not I will remove it)

    @data =Hash.new {|h,k| h[k] = Hash.new(&h.default_proc) }
    tday = Date.today.end_of_day.in_time_zone("London")
    yday = (tday - 1.day).beginning_of_day.in_time_zone("London")
    items = Item.where(created_at: yday..tday)
                  .group(Arel::Nodes::NamedFunction.new('CAST',[Item.arel_table[:created_at].as('DATE')]).to_sql,
                        :cat1,:cat2,:cat3)
                  .count
    
    l_tday = tday.to_date
    l_yday = yday.to_date
    
    (1..20).each do |n|
      
      @data[n]['a_sale']['today'] = items.fetch([l_tday,1,1,n], 0)
      @data[n]['a_sale']['yday'] = items.fetch([l_yday,1,1,n], 0)
      @data[n]['a_lease']['today'] = items.fetch([l_tday,1,2,n], 0)
      @data[n]['a_lease']['yday'] = items.fetch([l_yday,1,2,n], 0)
     #...
    end 
    # optionally you could issue a separate query to collect the totals
    @total_today = items.sum {|k,v| k.include?(l_tday) ? v : 0 }
    

    We are querying all items created between yesterday beginning of day and today end of day and grouping them by the created_date as date, cat1, cat2, and cat3 and count each group. (1 query)

    Then we iterate over the range 1..20 and just look up the groups by [date,cat1,cat2,n] where n represents cat3

    Login or Signup to reply.
  2. ActiveRecord has a set of methods for collecting aggregates (#count, #sum etc) however they are not very good for anything but trivial use cases. Using #count in a loop is a very poor approach to any problem as you’re creating N number of database queries.

    In Postgres 9.4+ we can use FILTER to get the count for a single day:

    SELECT 
      COUNT(items.*) FILTER (
        date_trunc('day', 'created_at') = CURRENT_DATE - 1
      ) AS yday
      COUNT(items.*) FILTER (
        date_trunc('day', 'created_at') = CURRENT_DATE
      ) AS today
    FROM items
    WHERE date_trunc('day', 'created_at') >= CURRENT_DATE - 1
    

    This example assumes that you’re using UTC internally in the database and that we can ignore the timezone. Otherwise you can use AT TIME ZONE to cast to a timezone.

    To get the values per category combination you can apply a grouping:

    SELECT 
      items.cat1,
      items.cat2,
      items.cat3,
      COUNT(items.*) FILTER (
        date_trunc('day', 'created_at') = CURRENT_DATE - 1
      ) AS yday
      COUNT(items.*) FILTER (
        date_trunc('day', 'created_at') = CURRENT_DATE
      ) AS today
    FROM items
    GROUP BY cat1, cat2, cat3
    WHERE date_trunc('day', 'created_at') >= CURRENT_DATE - 1
    

    You can use this SQL together with Item.connection.select_all to get the data as raw results (hashes).

    However if your table is going to grow significantly it would be wise to consider defining a view which can be described kind of like a table that the database populates by running a SQL query as this will be much more performant than querying the aggregates with every request. In Rails you can use the scenic gem to define views in migrations.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search