I am making a data intensive web application that I am trying to optimize. I’ve heard of forking and threading, but I have no idea whether they are applicable to what I am trying to do and if so how to implement them. My code looks like this:
def search
@amazon_data=Hash.from_xml(item.retrieve_amazon(params[:sku]))
unless @amazon_data['results'] == nil
@amazon_data['results']['item'].size.times do |i|
@all_books << { :vendor => 'Amazon.com',
:price => @amazon_data['results']['item'][i]['price'].to_f,
:shipping => @amazon_data['results']['item'][i]['ship'].to_f,
:condition => @amazon_data['results']['item'][i]['condition'],
:total => @amazon_data['results']['item'][i]['price'].to_f + @amazon_data['results']['item'][i]['ship'].to_f,
:availability => 'In Stock',
:link_text => 'Go to Amazon.com',
:link_url => "http://www.amazon.com/gp/offer-listing/#{params[:isbn]}"
}
end
end
@ebay_data=Hash.from_xml(Book.retrieve_ebay(params[:sku]))
unless @ebay_data['results'] == nil
@ebay_data['results']['item'].size.times do |i|
@all_books << { :vendor => 'eBay',
:price => @ebay_data['results']['item'][i]['price'].to_f,
:shipping => @ebay_data['results']['item'][i]['ship'].to_f,
:condition => 'Used',
:total => @ebay_data['results']['item'][i]['price'].to_f + @ebay_data['results']['item'][i]['ship'].to_f,
:availability => 'In Stock',
:link_text => 'Go to eBay',
:link_url => "http://www.amazon.com/gp/offer-listing/#{params[:sku]}"
}
end
end
end
So, basically what I have are two actions that retrieve data from eBay and Amazon and parse it here. How might I make both of these actions run at once? Do fork or thread have anything to do with what I am trying to accomplish?
This cuts the API time in half, but I don’t know how to return the results. The subsequent view loads before the API results are returned…. It is returning data, however. When I code in
puts @all_books
within the thread results are displayed in the console. Outside of the thread, however, results are not returned.
def search
Thread.new do
@amazon_data=Hash.from_xml(item.retrieve_amazon(params[:sku]))
unless @amazon_data['results'] == nil
@amazon_data['results']['item'].size.times do |i|
@all_books << { :vendor => 'Amazon.com',
:price => @amazon_data['results']['item'][i]['price'].to_f,
:shipping => @amazon_data['results']['item'][i]['ship'].to_f,
:condition => @amazon_data['results']['item'][i]['condition'],
:total => @amazon_data['results']['item'][i]['price'].to_f + @amazon_data['results']['item'][i]['ship'].to_f,
:availability => 'In Stock',
:link_text => 'Go to Amazon.com',
:link_url => "http://www.amazon.com/gp/offer-listing/#{params[:isbn]}"
}
end
end
end
Thread.new do
@ebay_data=Hash.from_xml(Book.retrieve_ebay(params[:sku]))
unless @ebay_data['results'] == nil
@ebay_data['results']['item'].size.times do |i|
@all_books << { :vendor => 'eBay',
:price => @ebay_data['results']['item'][i]['price'].to_f,
:shipping => @ebay_data['results']['item'][i]['ship'].to_f,
:condition => 'Used',
:total => @ebay_data['results']['item'][i]['price'].to_f + @ebay_data['results']['item'][i]['ship'].to_f,
:availability => 'In Stock',
:link_text => 'Go to eBay',
:link_url => "http://www.amazon.com/gp/offer-listing/#{params[:sku]}"
}
end
end
end
end
Am I on the right track? How can I return the results from within the thread? Is it that the variable is only accessible within the thread, or does the problem lie in the fact that the program progresses before the results are returned?
Unfortunately the application requires realtime user input to query the APIs. The returned data needs to be fresh as it has to do with product pricing in marketplaces…For instance, a user would enter a SKU and with that information the program would make a request to the applicable sites (Amazon and eBay in this case.) Currently it makes the request to Amazon, parses the data, formats it, and then moves on to eBay, parses the data, and formats that. Then the formatted data is displayed in the view.
My thought was if I could make those API calls at the same time (on different threads?) it would save time on the web serving end as all that would be required is to parse the returned data and format it correctly. (Which I might also be able to expedite…)
3
Answers
It’s hard to say without more info, but my suspicion is that waiting for the API responses is where the majority of time is spent.
Try a different approach, where the request and processing of the API response is handled in a different process from the web serving process. The front end code will likely have to periodically poll for results, and inject the results of the operation into the page. But win is that the whole request doesn’t get backed up waiting for Amazon and Ebay to do their thang.
There are several plugins that can help, delayed_job is a good place to start.
Yeah, I still think you’d be better off with a job scheduler in this case. The absolute fastest that an action like this can perform is the slower of the two API requests — and you have no guarantees about network latency, load on the remote API, etc. Other the other hand, you will have to implement some Javascript code to periodically poll to detect the job completion and inform the user of the results.
Also, thread behavior in ruby 1.8 can be kinda funky at times, especially at scale, so beware.
You might also look into EventMachine which allows you to execute your outbound network calls in a non-blocking way. If you could return the first result to the user, the get the final result over ajax, the user interaction will feel faster.
This is similar to what Kayak.com does with its real-time flight search.
You could also consider caching results, returning those to the user quickly, then populating updated results (that you loaded async) via ajax. (you’d have to figure out the right UI for that, maybe just put ‘popular’ results above the fold, and then latest updates below the fold or something)
*EventMachine is complicated