I am working on a project using CodeIgniter 3 where I store user-specific logs in separate files for each user. I have around 1.3 million users, and I need to move these log files to an AWS S3 bucket. After successfully uploading the files, I also need to update the corresponding entries in the users table in my database.
Given the large number of users, what is the best approach to:
Efficiently upload these user-specific log files to AWS S3.
Update the user data in the database once the files are successfully uploaded.
Any advice on handling this large-scale operation efficiently, along with code examples, would be greatly appreciated. Thank you!
What I’ve tried:
I’ve created two methods:
- The first method generates text files in batches of 10,000 users. Each file contains an array of objects where each object has the local path of the file, the S3 path where the file should be stored, and the user ID.
- The second method reads the generated text file, uploads the data to S3, updates the users table with the S3 path, and deletes the text file after the batch is complete.
The batch generation and deletion process work fine, but I’m still looking for ways to optimize this workflow to handle all 1.3 million users efficiently.
The issue I’m facing is that after uploading 300 to 400 files, I’m getting a "maximum execution time exceeded" error, which halts the entire process.
What I was expecting:
I was hoping to find an efficient way to upload files in batches to AWS S3 without running into the execution time limit and ensuring that the database is updated correctly.
Questions:
How can I optimize the bulk file uploads to AWS S3 to avoid hitting the execution time limit?
Is there a way to batch or queue these uploads in CodeIgniter to prevent the process from timing out?
What strategies can I use to ensure that the entire upload and update process completes efficiently for 1.3 million users?
Any advice, code examples, or best practices for handling this large-scale operation without hitting the execution time limit would be greatly appreciated!
3
Answers
To efficiently upload a large number of user-specific log files to AWS S3 and update your database without hitting the maximum execution time limit in PHP, you can adopt several strategies. Here’s a comprehensive approach:
1. Use CLI Scripts Instead of Web Scripts
Why? Web servers are designed to handle quick HTTP requests and have strict execution time limits (e.g.,
max_execution_time
in PHP). Running long-running scripts via the web is not recommended and can lead to timeouts.Solution: Use PHP CLI (Command Line Interface) scripts to run your batch processes. CLI scripts are not subject to the same execution time limits as web scripts.
How to Implement in CodeIgniter 3:
CodeIgniter supports CLI controllers out of the box. You can create a controller that can be executed from the command line.
Example:
Create a controller in
application/controllers
namedBatchProcessor.php
:Run the script from the command line:
You can increment the batch number to process the next batch:
To automate this, you can write a shell script or use a cron job to loop through the batches.
2. Implement Batch Processing with a Queue System
Why? Processing 1.3 million files in one go is resource-intensive and prone to failures. Breaking down the process into smaller, manageable batches improves efficiency and reliability.
Solution: Use a job queue to process uploads asynchronously.
Options:
Example with Amazon SQS:
Set Up SQS Queue:
Create a queue in AWS SQS for your upload jobs.
Enqueue Jobs:
Modify your script to enqueue a job for each file or batch of files.
Worker Script:
Create a worker script that runs continuously or via a cron job to process the queue.
Run the worker script:
3. Increase Script Execution Time Temporarily (Not Recommended for Web Scripts)
If you still prefer to stick with web scripts (though not recommended), you can increase the
max_execution_time
setting in yourphp.ini
or script.Example:
Important: This is not recommended for web-based scripts and should be used with caution. Long-running web scripts can tie up server resources and affect other users.
4. Optimize AWS S3 Uploads with Concurrency
Use the AWS SDK for PHP to manage uploads efficiently. The SDK supports concurrent uploads and handles retries automatically.
Example:
load->library(‘S3Uploader’);
$this->load->model(‘User_model’);
$this->sqsClient = new AwsSqsSqsClient([
‘version’ => ‘latest’,
‘region’ => ‘your-region’,
‘credentials’ => [
‘key’ => ‘your-key’,
‘secret’ => ‘your-secret’,
],
]);
$this->queueUrl = ‘https://sqs.your-region.amazonaws.com/your-account-id/your-queue-name’;
}
public function process_queue()
{
while (true) {
$result = $this->sqsClient->receiveMessage([
‘QueueUrl’ => $this->queueUrl,
‘MaxNumberOfMessages’ => 10, // Adjust as needed
‘WaitTimeSeconds’ => 20, // Long polling
]);
if (!isset($result[‘Messages’])) {
continue;
}
foreach ($result[‘Messages’] as $message) {
$body = json_decode($message[‘Body’], true);
$userId = $body[‘user_id’];
$localFilePath = $body[‘local_path’];
$s3Key = $body[‘s3_key’];
$uploadSuccess = $this->s3uploader->uploadFile($localFilePath, $s3Key);
if ($uploadSuccess) {
$s3Path = ‘s3://your-bucket/’ . $s3Key;
$this->User_model->update_user_log_path($userId, $s3Path);
// Delete the message from the queue
$this->sqsClient->deleteMessage([
‘QueueUrl’ => $this->queueUrl,
‘ReceiptHandle’ => $message[‘ReceiptHandle’],
]);
// Delete local file if needed
unlink($localFilePath);
} else {
// Handle failure and optionally retry
}
}
}
}
}
Note: Adjust the
concurrency
level based on your server’s capacity and network bandwidth.5. Use Multipart Uploads for Large Files
If you have large files (over 100 MB), use S3’s multipart upload capability to upload parts of a file in parallel.
However, for smaller log files, this might not provide significant benefits.
6. Monitor and Optimize Resource Usage
Ensure that your script efficiently uses memory and handles potential memory leaks.
Unset Variables: After processing each file, unset variables that are no longer needed.
Garbage Collection: Invoke garbage collection if necessary.
7. Consider Compressing Log Files
If appropriate, compress log files to reduce upload time and storage space.
Compress Before Upload: