I am trying to get sentiment scores of random product descriptions from a CSV file, I’m facing a problem with what I think is the API response time, not sure if I’m traversing through the CSV using the API incorrectly / un-efficiently but it is taking a long time to get results for all the 300+ entries in the CSV and whenever I want to push new changes to my codebase I need to wait for the API to re-evaluate the entries every time, here is my code I made for loading in the CSV file and for getting the sentiment scores
<?php
set_time_limit(500); // extended timeout due to slow / overwhelmed API response
function extract_file($csv) { // CSV to array function
$file = fopen($csv, 'r');
while (!feof($file)) {
$lines[] = fgetcsv($file, 1000, ',');
}
fclose($file);
return $lines;
}
$the_file = 'dataset.csv';
$csv_data = extract_file($the_file);
$response_array = []; // array container to hold returned sentiment values from among prduct descriptions
for($x = 1; $x < count($csv_data) - 1; $x++) { // loop through all descriptions
echo $x; // show iteration
$api_text = $csv_data[$x][1];
$api_text = str_replace('&', ' and ', $api_text); // removing escape sequence characters, '&' breaks the api :)
$api_text = str_replace(" ", "%20", $api_text); // serializing string
$text = 'text=';
$text .=$api_text; // serializing string further for the API
//echo 'current text1: ', $api_text;
$curl = curl_init(); // API request init
curl_setopt_array($curl, [
CURLOPT_URL => "https://text-sentiment.p.rapidapi.com/analyze",
CURLOPT_RETURNTRANSFER => true,
CURLOPT_FOLLOWLOCATION => true,
CURLOPT_ENCODING => "",
CURLOPT_MAXREDIRS => 10,
CURLOPT_TIMEOUT => 30,
CURLOPT_HTTP_VERSION => CURL_HTTP_VERSION_1_1,
CURLOPT_CUSTOMREQUEST => "POST",
CURLOPT_POSTFIELDS => $text,
CURLOPT_HTTPHEADER => [
"X-RapidAPI-Host: text-sentiment.p.rapidapi.com",
"X-RapidAPI-Key: <snip>",
"content-type: application/x-www-form-urlencoded"
],
]);
$response = curl_exec($curl);
$err = curl_error($curl);
curl_close($curl);
if ($err) {
echo "cURL Error #:" . $err;
} else {
echo $response;
}
$json = json_decode($response, true); // convert response to JSON format
if(isset($json["pos"]) == false) { // catching response error 100, makes array faulty otherwise
continue;
}
else {
array_push($response_array, array($x, "+" => $json["pos"], "-" => $json["neg"])); // appends array with sentiment values at current index
}
}
echo "<br>";
echo "<br> results: ";
echo "<p>";
for ($y = 0; $y < count($response_array); $y++){ // prints out all the sentiment values
echo "<br>";
echo print_r($response_array[$y]);
echo "<br>";
}
echo "</p>";
echo "<br>the most negative description: ";
$max_neg = array_keys($response_array, max(array_column($response_array, '-')));
//$max_neg = max(array_column($response_array, '-'));
echo print_r($csv_data[$max_neg[0]]);
echo "<br>the most positive description: ";
$max_pos = array_keys($response_array, max(array_column($response_array, '+')));
echo print_r($csv_data[$max_pos[0]]);
?>
What this code snippet aims to do is find the most negative and most positive sentiment among the description column in the csv and print them out according to their index, I’m only interested in finding descriptions with the highest amount of positive and negative sentiment word number not the percentage of the overall sentiment
The file can be found in this git repo
Thanks for any suggestions
2
Answers
This can be achieved by creating a cache file.
This solution creates a file
cache.json
that contains the results from the API, using the product name as the key for each entry.On subsequent calls, it will use the cache value if it exists.
Results in:
You need to know where the time is going.
Start with identifying where the time goes in the curl request.
My guess is the API response time.
If that’s the case I have a solution. Meanwhile I will get the "multi-tasking" code code I use to do simultaneous curl requests.
curl has the timing you need. It looks like this:
Just add a couple of lines of code
Running simultaneous curl sockets.
Put your curl in curl.php
This code goes in your CSV loop to create all the URL query fields to pass to curl.php (e.g.
http://127.0.0.1/curl.php?text=$text
)Then process all the URLs.
Then Monitor the sockets and retrieve the response from each socket.
Then close the socket until you have them all.
Then all your results are in
$results[]
.