I’m trying to ingest data from the open-source, public Yahoo Finance API using Azure Data Factory. The endpoint I’m testing is https://query2.finance.yahoo.com/v8/finance/chart/GOLD.
I am able to ingest the data but I’m coming up with an issue when trying to transform the data as part of a data flow. I am trying to flatten the JSON produced, which is a series of nested arrays in the structure of:
To produce a table in the below format:
timestamp | volume | open | low | high | close |
---|---|---|---|---|---|
The setup of my flatten activity is as follows:
The Partition option
I’m using is Use current partitioning
. This is what it looks like under the Inspect
tab:
However, when I try to preview the data, nothing comes up and the notifications in ADF show this error:
Could not fetch statistics due to operation timeout.
In the source, I’ve tried sampling the data to only 10 rows and I’m getting the same error so I don’t think this is the issue. I have also tried a different API endpoint (MSFT) and I’m getting the same error here as well.
Any ideas appreciated!
Thanks,
Carolina
2
Answers
Figured it out! It was because the amount of data it was trying to ingest was too large. I set the query parameters as below and I'm now getting data through:
The error you are getting is might because of your large data set. The default IR used for debug mode in data flows is a small 4-core single worker node with a 4-core single driver node. to work with large dataset, you need to increase the single worker node with a driver node as below:
large
as compute size:The way a data flow previews data can be changed. Clicking "Debug Settings" on the Data Flow canvas toolbar will allow you to change the debug settings. Here, you can reduce the no of rows in preview:
You can also enable sampling of data to use sample amount of data from source for testing purpose: