skip to Main Content

I’m working for a certain trader and he wants to build a relatively huge archive of stock quotes for further analysis. We have an access to information provider, and the data always have the same structure: ticker name, timestamp, and OHLC so it fits relational database approach.
So far I made a working prototype using Ruby on Rails (Ruby 2.6, Rails 6.0) and PostgreSQL. It could fetch the data from the provider pretty fast, but storing millions of records to the database is very slow. I switched to plain SQL from ActiveRecord style and it made it 3 times faster, but still it is very long process to store just a little amount of the needed data.
So I need to improve the performance of the prototype somehow, but not sure what to do exactly. I can use something like Redis for sure, but I need to store the data to persistent storage anyway.
What should I do?

2

Answers


  1. This question will probably be flagged, but I’ll give you an answer anyway:

    ActiveRecord isn’t what’s necessarily slow. ActiveModel is slow.

    ActiveModel is what maps your result set and database types into a ruby-friendly object, and what takes your data (string-y things) and converts it into database types for inserting/updating.

    If you’re rendering JSON, make sure you’re using the C renderer OJ or similar: https://github.com/ohler55/oj — it makes a huge difference for larger datasets and responses.

    If you’re comfortable with a result set being a hash that you can loop through, you can avoid hitting ActiveModel with connection#exec_query:

    stocks_query = Stock.all
    results = Stock.connection.exec_query(stocks_query.to_sql).to_a 
    
    results.each do |result|
      puts result 
    end
    

    If you’re inserting millions of rows, or even updating millions of rows, you should probably use the right tool for the job: raw SQL. But that gets unwieldy and you like ruby, so you can use something like https://github.com/zdennis/activerecord-import which will allow you to write ruby and it’ll take care of the rest.

    Tens of millions of records isn’t that many. PostgreSQL is probably your best bet without introducing a ton of unnecessary complexity. You just need to know the bottlenecks of communicating with the database layer with Rails, and how to work around them. Make sure that your database is indexed correctly as well. https://github.com/plentz/lol_dba is a good place to start if you’re unsure.

    Login or Signup to reply.
  2. Most DBMSes support BULK INSERT that you can leverage, or better yet you can use PG’s COPY command which is optimized for loading large numbers of rows. This, of course, would mean writing raw SQL.

    If for some reason you don’t want to use COPY and you’d rather INSERT your rows, it might be a smart thing to disable auto-commit in PG with SET AUTOCOMMIT TO OFF and then committing after every row has been inserted — something along the lines of:

    BEGIN;
    
    INSERT INTO table (column1, column2, …)
    VALUES
        (value1, value2, …),
        (value1, value2, …) ,...;
    
    COMMIT;
    

    The idea here will be to initially remove all your indexes and foreign key constraints if you have them and then putting them back after you’re done. This should speed things up.

    Also if you can (and it’s safe) get data ready in a file (not sure of how else you can make sure the data is ready to be inserted when the DB is ready for the next row without complicating thing).

    PG has a guide for things like this. There’s more info on these answers too.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search