I am requesting the data from some kind of Products API, but the thing is that I am getting it 20 by 20. So the endpoint looks like this:
https://www.someDummyAPI.com/Api/Products?offset=0&count=20
Note: I can’t change the count, it will always be 20.
I.e. The data from this endpoint will contain 20 record, from 0 to 20 and after that I have to increase offset by 20 to get next 20 record and so on (totally it’s about 1500 record so I have to make approximately 700 request ).
After getting all the data I am inserting it into the SQL database using stored procedure (this is different process).
So my question is, how can I speed up the fetching process, I thought about running tasks in parallel but I need to get results from the response.
For now this process looks like this :
protected async void FSL_Sync_btn_Click(object sender, EventArgs e)
{
int offset = 0;
int total= 0;
bool isFirst = true;
DataTable resTbl = CreateDt();
while (offset < total || offset == 0)
{
try
{
var data = await GetFSLData(offset.ToString(),"Products");
JObject Jresult = JObject.Parse(data);
if (isFirst)
{
Int32.TryParse(Jresult.SelectToken("total").ToString(),out total);
isFirst = false;
}
// Function to chain up data in DataTable
resTbl = WriteInDataTable(resTbl, Jresult);
offset += 20;
}
catch(Exception ex)
{
var msg = ex.Message;
}
}
}
So the process flow I am taking is:
- Get data from API (let’s say first 20 record).
- Add it two existing
DataTable
usingWriteInDataTable
function. - Insert data into SQL Database from this
resTbl
Datatable
(completely different process, not shown in this screenshot).
I haven’t used parallel tasks yet (don’t even know if it’s a correct solution for it), so would appreciate any help.
4
Answers
Get your first record and set the total first before the loop:
In the next step you can then parallelize your tasks:
Then you can use
Task.WhenAll
to get the dataJust some things to be aware of: You will be hitting that api with a lot of requests simultaneously, and it might not be a good idea. You could use
TransformBlock
andActionBlock
in the TPL dataflow library if you run into this problem. You can find more information on that here:https://learn.microsoft.com/en-us/dotnet/standard/parallel-programming/dataflow-task-parallel-library
You could use
Task.WhenAll
to run your requests in parallel.This method will try to create or utilize 1000 threads and manage them, which might be harmful to performance, but will be significantly faster than launching them in order. You might consider batching them to achieve even better performance and launch like 100 tasks at a time.
It’s quite hard to know what you’re really using and getting, due to the high abstraction level in your code (which is IMHO good, but quite hard to spot errors on a page like SO).
So here is just a sketch on how you can parallelize all requests to your API to improve the fetch time and write the results once into the database. Maybe there are some quotas on the API and you maybe have to run these things in chunks, but this can easily be adopted through LINQ.
If you have upgraded to the .NET 6 platform, you could consider using the
Parallel.ForEachAsync
method to parallelize theGetFSLData
invocations. This method requires anIEnumerable<T>
sequence as source. You can create this sequence using LINQ (theEnumerable.Range
method). To avoid any problems associated with the thread-safety of theDataTable
class, you can store theJObject
results in an intermediateConcurrentQueue<JObject>
collection, and defer the creation of theDataTable
until all the data have been fetched and are locally available. You may need to also store theoffset
associated with eachJObject
, so that the results can be inserted in their original order. Putting everything together:The
Volatile.Read
/Volatile.Write
are required because thetotal
variable might be accessed by multiple threads in parallel.In order to get optimal performance, you may need to adjust the
MaxDegreeOfParallelism
configuration, according to the capabilities of the remote server and your internet connection.Note: This solution is not efficient memory-wise, because it requires that all data are stored in memory in two different formats at the same time.