I have a model Item
with the corresponding database table items
where is currently about 240,000 records.
I need to generate statistics from the data in the items
table on my dashboard. Currently, I am trying to do it this way:
def dashboard
if current_user.present?
@data = Hash.new
(1..20).each do |n|
@data[n] = Hash.new
@data[n]['a_sale'] = Hash.new
@data[n]['a_lease'] = Hash.new
@data[n]['b_sale'] = Hash.new
@data[n]['b_lease'] = Hash.new
@data[n]['c_sale'] = Hash.new
@data[n]['c_lease'] = Hash.new
@data[n]['d_sale'] = Hash.new
@data[n]['d_lease'] = Hash.new
tday = Date.today
yday = tday-1.day
@data[n]['a_sale']['today'] = Item.where('cat1 = 1 AND cat2 = 1 AND cat3 = ? AND created_at >= ? AND created_at <= ?', n, tday.beginning_of_day.in_time_zone("London"), tday.end_of_day.in_time_zone("London")).count()
@data[n]['a_sale']['yday'] = Item.where('cat1 = 1 AND cat2 = 1 AND cat3 = ? AND created_at >= ? AND created_at <= ?', n, yday.beginning_of_day.in_time_zone("London"), yday.end_of_day.in_time_zone("London")).count()
@data[n]['a_lease']['today'] = Item.where('cat1 = 1 AND cat2 = 2 AND cat3 = ? AND created_at >= ? AND created_at <= ?', n, tday.beginning_of_day.in_time_zone("London"), tday.end_of_day.in_time_zone("London")).count()
@data[n]['a_lease']['yday'] = Item.where('cat1 = 1 AND cat2 = 2 AND cat3 = ? AND created_at >= ? AND created_at <= ?', n, yday.beginning_of_day.in_time_zone("London"), yday.end_of_day.in_time_zone("London")).count()
@data[n]['b_sale']['today'] = Item.where('cat1 = 2 AND cat2 = 1 AND cat3 = ? AND created_at >= ? AND created_at <= ?', n, tday.beginning_of_day.in_time_zone("London"), tday.end_of_day.in_time_zone("London")).count()
@data[n]['b_sale']['yday'] = Item.where('cat1 = 2 AND cat2 = 1 AND cat3 = ? AND created_at >= ? AND created_at <= ?', n, yday.beginning_of_day.in_time_zone("London"), yday.end_of_day.in_time_zone("London")).count()
@data[n]['b_lease']['today'] = Item.where('cat1 = 2 AND cat2 = 2 AND cat3 = ? AND created_at >= ? AND created_at <= ?', n, tday.beginning_of_day.in_time_zone("London"), tday.end_of_day.in_time_zone("London")).count()
@data[n]['b_lease']['yday'] = Item.where('cat1 = 2 AND cat2 = 2 AND cat3 = ? AND created_at >= ? AND created_at <= ?', n, yday.beginning_of_day.in_time_zone("London"), yday.end_of_day.in_time_zone("London")).count()
@data[n]['c_sale']['today'] = Item.where('cat1 = 3 AND cat2 = 1 AND cat3 = ? AND created_at >= ? AND created_at <= ?', n, tday.beginning_of_day.in_time_zone("London"), tday.end_of_day.in_time_zone("London")).count()
@data[n]['c_sale']['yday'] = Item.where('cat1 = 3 AND cat2 = 1 AND cat3 = ? AND created_at >= ? AND created_at <= ?', n, yday.beginning_of_day.in_time_zone("London"), yday.end_of_day.in_time_zone("London")).count()
@data[n]['c_lease']['today'] = Item.where('cat1 = 3 AND cat2 = 2 AND cat3 = ? AND created_at >= ? AND created_at <= ?', n, yday.beginning_of_day.in_time_zone("London"), tday.end_of_day.in_time_zone("London")).count()
@data[n]['c_lease']['yday'] = Item.where('cat1 = 3 AND cat2 = 2 AND cat3 = ? AND created_at >= ? AND created_at <= ?', n, yday.beginning_of_day.in_time_zone("London"), yday.end_of_day.in_time_zone("London")).count()
@data[n]['d_sale']['today'] = Item.where('cat1 = 4 AND cat2 = 1 AND cat3 = ? AND created_at >= ? AND created_at <= ?', n, tday.beginning_of_day.in_time_zone("London"), tday.end_of_day.in_time_zone("London")).count()
@data[n]['d_sale']['yday'] = Item.where('cat1 = 4 AND cat2 = 1 AND cat3 = ? AND created_at >= ? AND created_at <= ?', n, yday.beginning_of_day.in_time_zone("London"), yday.end_of_day.in_time_zone("London")).count()
@data[n]['d_lease']['today'] = Item.where('cat1 = 4 AND cat2 = 2 AND cat3 = ? AND created_at >= ? AND created_at <= ?', n, tday.beginning_of_day.in_time_zone("London"), tday.end_of_day.in_time_zone("London")).count()
@data[n]['d_lease']['yday'] = Item.where('cat1 = 4 AND cat2 = 2 AND cat3 = ? AND created_at >= ? AND created_at <= ?', n, yday.beginning_of_day.in_time_zone("London"), yday.end_of_day.in_time_zone("London")).count()
@total_today = Item.where('created_at >= ? AND created_at <= ?', tday.beginning_of_day.in_time_zone("London"), tday.end_of_day.in_time_zone("London")).count()
@total_yday = Item.where('created_at >= ? AND created_at <= ?', yday.beginning_of_day.in_time_zone("London"), yday.end_of_day.in_time_zone("London")).count()
end
end
end
What happens when I try to open this page in the app is that the page either doesn’t load at all (timeout) or it loads, but it takes close to 30 seconds.
I tried to add indeces to the cat1
, cat2
, cat3
and created_at
columns, but it didn’t help.
There’s a possibility that the number of rows in the table items
will grow to ~1M in the next 3-4 months.
How do I generate stats about the data from the tables of similar sizes? Also, I realize the code above might not appear to look very intelligent – I tried to put something together quickly (I am happy to learn how to generate stats better, though).
2
Answers
Untested but too long for a comment.
The following should work for you (if not I will remove it)
We are querying all items created between yesterday beginning of day and today end of day and grouping them by the created_date as date, cat1, cat2, and cat3 and count each group. (1 query)
Then we iterate over the range
1..20
and just look up the groups by[date,cat1,cat2,n]
wheren
representscat3
ActiveRecord has a set of methods for collecting aggregates (
#count
,#sum
etc) however they are not very good for anything but trivial use cases. Using#count
in a loop is a very poor approach to any problem as you’re creating N number of database queries.In Postgres 9.4+ we can use
FILTER
to get the count for a single day:This example assumes that you’re using UTC internally in the database and that we can ignore the timezone. Otherwise you can use
AT TIME ZONE
to cast to a timezone.To get the values per category combination you can apply a grouping:
You can use this SQL together with
Item.connection.select_all
to get the data as raw results (hashes).However if your table is going to grow significantly it would be wise to consider defining a view which can be described kind of like a table that the database populates by running a SQL query as this will be much more performant than querying the aggregates with every request. In Rails you can use the scenic gem to define views in migrations.