The API I am using only returns a max of 1000 records per request, so I need to change the range and keep requesting until all records are returned. I am trying to keep the parent node and merge the json data under the "StudyFields" node. Just having the data under StudyFields would work also.
API URL (example):
Data format:
{
"StudyFieldsResponse":{
"APIVrs":"1.01.05",
"DataVrs":"2023:09:07 00:28:48.692",
"Expression":"aspirin",
"NStudiesAvail":465100,
"NStudiesFound":2548,
"MinRank":1,
"MaxRank":1000,
"NStudiesReturned":1000,
"FieldList":[
"InterventionName",
"NCTId",
"BriefTitle",
"InterventionType",
"InterventionDescription",
"InterventionOtherName",
"OverallStatus",
"LastUpdateSubmitDate"
],
"StudyFields":[
{
"Rank":1,
"InterventionName":[
"Aspirin",
"blood sample"
],
"NCTId":[
"NCT01375400"
]
}
]
}
}
What I am getting (multiple parent nodes):
{
"StudyFieldsResponse":{
"APIVrs":"1.01.05",
"DataVrs":"2023:09:07 00:28:48.692",
"Expression":"aspirin",
"NStudiesAvail":465100,
"NStudiesFound":2548,
"MinRank":1,
"MaxRank":1000,
"NStudiesReturned":1000,
"FieldList":[
"InterventionName",
"NCTId",
"BriefTitle",
"InterventionType",
"InterventionDescription",
"InterventionOtherName",
"OverallStatus",
"LastUpdateSubmitDate"
],
"StudyFields":[
{
"Rank":1,
"InterventionName":[
"Aspirin",
"blood sample"
],
"NCTId":[
"NCT01375400"
]
]
}
]
}
}
{
"StudyFieldsResponse":{
"APIVrs":"1.01.05",
"DataVrs":"2023:09:07 00:28:48.692",
"Expression":"aspirin",
"NStudiesAvail":465100,
"NStudiesFound":2548,
"MinRank":1001,
"MaxRank":2000,
"NStudiesReturned":1000,
"FieldList":[
"InterventionName",
"NCTId",
"BriefTitle",
"InterventionType",
"InterventionDescription",
"InterventionOtherName",
"OverallStatus",
"LastUpdateSubmitDate"
],
"StudyFields":[
{
"Rank":1001,
"InterventionName":[
"Naoxintong Capsule",
"Placebo"
],
"NCTId":[
"NCT05278182"
]
}
}
]
}
}
What I want (single parent, multiple children under the StudyFields node):
{
"StudyFieldsResponse":{
"APIVrs":"1.01.05",
"DataVrs":"2023:09:07 00:28:48.692",
"Expression":"aspirin",
"NStudiesAvail":465100,
"NStudiesFound":2548,
"MinRank":1,
"MaxRank":1000,
"NStudiesReturned":1000,
"FieldList":[
"InterventionName",
"NCTId",
"BriefTitle",
"InterventionType",
"InterventionDescription",
"InterventionOtherName",
"OverallStatus",
"LastUpdateSubmitDate"
],
"StudyFields":[
{
"Rank":1,
"InterventionName":[
"Aspirin",
"blood sample"
],
"NCTId":[
"NCT01375400"
]
}
]
{
"Rank":1001,
"InterventionName":[
"Naoxintong Capsule",
"Placebo"
],
"NCTId":[
"NCT05278182"
]
}
]
}
}
Here is my code:
$max_rnk=1000;
for ($i=1; $i<=$NStudiesFound; $i=$i+1000){
$url1= 'https://classic.clinicaltrials.gov/api/query/study_fields?expr='.$input_result;
$url1.='&fields='.$fields;
$url1.='&min_rnk='.$i.'&max_rnk='.$max_rnk.'&fmt=json';
$data[]=json_decode(file_get_contents($url1),true);
$max_rnk=$max_rnk+1000;
}
$file = 'intervention4.json';
$data_merge = json_encode($data);
file_put_contents($file, $data_merge);
3
Answers
Use
array_merge()
to concatenate the arrays rather than pushing each response into a new element of the array.If you just want to aggregate things under the ‘StudyFields’ key, then you can use something like this:
Simply dumping all the result sets together into one gigantic array is going to be problematic as your data set grows. This is the perfect place to implement a generator. Make an API hit to grab 1000 results, yield them individually, and then repeat until you’re out of records. This way, you process as you go, and you only ever use 1000 record’s worth of resources.
Then all you need to do is iterate over an instance of the object itself, and you’ll transparently get each record individually:
Even better, create a constructor so you can pass the other arguments, and make the field list a default:
Now you can pass your search field as the argument:
And if you want to override the default fields returned, you can do: