I’m working for a certain trader and he wants to build a relatively huge archive of stock quotes for further analysis. We have an access to information provider, and the data always have the same structure: ticker name, timestamp, and OHLC so it fits relational database approach.
So far I made a working prototype using Ruby on Rails (Ruby 2.6, Rails 6.0) and PostgreSQL. It could fetch the data from the provider pretty fast, but storing millions of records to the database is very slow. I switched to plain SQL from ActiveRecord style and it made it 3 times faster, but still it is very long process to store just a little amount of the needed data.
So I need to improve the performance of the prototype somehow, but not sure what to do exactly. I can use something like Redis for sure, but I need to store the data to persistent storage anyway.
What should I do?
2
Answers
This question will probably be flagged, but I’ll give you an answer anyway:
ActiveRecord isn’t what’s necessarily slow. ActiveModel is slow.
ActiveModel is what maps your result set and database types into a ruby-friendly object, and what takes your data (string-y things) and converts it into database types for inserting/updating.
If you’re rendering JSON, make sure you’re using the C renderer OJ or similar: https://github.com/ohler55/oj — it makes a huge difference for larger datasets and responses.
If you’re comfortable with a result set being a hash that you can loop through, you can avoid hitting ActiveModel with
connection#exec_query
:If you’re inserting millions of rows, or even updating millions of rows, you should probably use the right tool for the job: raw SQL. But that gets unwieldy and you like ruby, so you can use something like https://github.com/zdennis/activerecord-import which will allow you to write ruby and it’ll take care of the rest.
Tens of millions of records isn’t that many. PostgreSQL is probably your best bet without introducing a ton of unnecessary complexity. You just need to know the bottlenecks of communicating with the database layer with Rails, and how to work around them. Make sure that your database is indexed correctly as well. https://github.com/plentz/lol_dba is a good place to start if you’re unsure.
Most DBMSes support
BULK INSERT
that you can leverage, or better yet you can use PG’sCOPY
command which is optimized for loading large numbers of rows. This, of course, would mean writing raw SQL.If for some reason you don’t want to use
COPY
and you’d ratherINSERT
your rows, it might be a smart thing to disable auto-commit in PG withSET AUTOCOMMIT TO OFF
and then committing after every row has been inserted — something along the lines of:The idea here will be to initially remove all your indexes and foreign key constraints if you have them and then putting them back after you’re done. This should speed things up.
Also if you can (and it’s safe) get data ready in a file (not sure of how else you can make sure the data is ready to be inserted when the DB is ready for the next row without complicating thing).
PG has a guide for things like this. There’s more info on these answers too.