skip to Main Content

I am working on a project using CodeIgniter 3 where I store user-specific logs in separate files for each user. I have around 1.3 million users, and I need to move these log files to an AWS S3 bucket. After successfully uploading the files, I also need to update the corresponding entries in the users table in my database.

Given the large number of users, what is the best approach to:

Efficiently upload these user-specific log files to AWS S3.
Update the user data in the database once the files are successfully uploaded.
Any advice on handling this large-scale operation efficiently, along with code examples, would be greatly appreciated. Thank you!

What I’ve tried:

I’ve created two methods:

  1. The first method generates text files in batches of 10,000 users. Each file contains an array of objects where each object has the local path of the file, the S3 path where the file should be stored, and the user ID.
  2. The second method reads the generated text file, uploads the data to S3, updates the users table with the S3 path, and deletes the text file after the batch is complete.
    The batch generation and deletion process work fine, but I’m still looking for ways to optimize this workflow to handle all 1.3 million users efficiently.

The issue I’m facing is that after uploading 300 to 400 files, I’m getting a "maximum execution time exceeded" error, which halts the entire process.
What I was expecting:
I was hoping to find an efficient way to upload files in batches to AWS S3 without running into the execution time limit and ensuring that the database is updated correctly.

Questions:

How can I optimize the bulk file uploads to AWS S3 to avoid hitting the execution time limit?
Is there a way to batch or queue these uploads in CodeIgniter to prevent the process from timing out?
What strategies can I use to ensure that the entire upload and update process completes efficiently for 1.3 million users?
Any advice, code examples, or best practices for handling this large-scale operation without hitting the execution time limit would be greatly appreciated!

3

Answers


  1. To efficiently upload a large number of user-specific log files to AWS S3 and update your database without hitting the maximum execution time limit in PHP, you can adopt several strategies. Here’s a comprehensive approach:

    1. Use CLI Scripts Instead of Web Scripts

    Why? Web servers are designed to handle quick HTTP requests and have strict execution time limits (e.g., max_execution_time in PHP). Running long-running scripts via the web is not recommended and can lead to timeouts.

    Solution: Use PHP CLI (Command Line Interface) scripts to run your batch processes. CLI scripts are not subject to the same execution time limits as web scripts.

    How to Implement in CodeIgniter 3:

    CodeIgniter supports CLI controllers out of the box. You can create a controller that can be executed from the command line.

    Example:

    Create a controller in application/controllers named BatchProcessor.php:

    <?php
    defined('BASEPATH') OR exit('No direct script access allowed');
    
    class BatchProcessor extends CI_Controller {
    
        public function __construct()
        {
            parent::__construct();
    
            // Ensure this controller can only be run from the CLI
            if (!is_cli()) {
                show_error('This script can only be accessed via the command line.', 403);
            }
    
            // Load necessary models, libraries, etc.
            $this->load->model('User_model');
            $this->load->library('S3Uploader'); // Custom library for S3 operations
        }
    
        public function upload_files($batch_number = 0)
        {
            $batch_size = 1000; // Adjust batch size as needed
    
            // Calculate offset
            $offset = $batch_number * $batch_size;
    
            // Fetch a batch of users
            $users = $this->User_model->get_users($batch_size, $offset);
    
            if (empty($users)) {
                echo "No more users to process.n";
                return;
            }
    
            foreach ($users as $user) {
                // Perform upload and database update
                $localFilePath = '/path/to/logs/' . $user->log_file_name;
                $s3Key = 'logs/' . $user->id . '/' . $user->log_file_name;
    
                $uploadSuccess = $this->s3uploader->uploadFile($localFilePath, $s3Key);
    
                if ($uploadSuccess) {
                    // Update database with S3 path
                    $s3Path = 's3://your-bucket/' . $s3Key;
                    $this->User_model->update_user_log_path($user->id, $s3Path);
    
                    // Optionally, delete local file if no longer needed
                    unlink($localFilePath);
                } else {
                    // Handle upload failure
                    echo "Failed to upload file for user ID: {$user->id}n";
                }
            }
    
            echo "Batch {$batch_number} processed.n";
        }
    }
    

    Run the script from the command line:

    php index.php batchprocessor upload_files 0
    

    You can increment the batch number to process the next batch:

    php index.php batchprocessor upload_files 1
    php index.php batchprocessor upload_files 2
    

    To automate this, you can write a shell script or use a cron job to loop through the batches.

    2. Implement Batch Processing with a Queue System

    Why? Processing 1.3 million files in one go is resource-intensive and prone to failures. Breaking down the process into smaller, manageable batches improves efficiency and reliability.

    Solution: Use a job queue to process uploads asynchronously.

    Options:

    • Use a Queue Library: Integrate a queuing library like CodeIgniter Task Queue Library or external systems like RabbitMQ, Beanstalkd, or Amazon SQS.
    • Database-backed Queue: Use your database to store jobs (not recommended for high-scale).

    Example with Amazon SQS:

    1. Set Up SQS Queue:

      Create a queue in AWS SQS for your upload jobs.

    2. Enqueue Jobs:

      Modify your script to enqueue a job for each file or batch of files.

      // Inside your controller or model
      use AwsSqsSqsClient;
      
      $sqsClient = new SqsClient(['version' => 'latest',
          'region'  => 'your-region',
          'credentials' => [
              'key'    => 'your-key',
              'secret' => 'your-secret',
          ],
      ]);
      
      $queueUrl = 'https://sqs.your-region.amazonaws.com/your-account-id/your-queue-name';
      
      // Enqueue jobs
      foreach ($users as $user) {
          $messageBody = json_encode([
              'user_id' => $user->id,
              'local_path' => '/path/to/logs/' . $user->log_file_name,
              's3_key' => 'logs/' . $user->id . '/' . $user->log_file_name,
          ]);
      
          $sqsClient->sendMessage([
              'QueueUrl'    => $queueUrl,
              'MessageBody' => $messageBody,
          ]);
      }
      
    3. Worker Script:

      Create a worker script that runs continuously or via a cron job to process the queue.

      <?php
      defined('BASEPATH') OR exit('No direct script access allowed');
      
      class SqsWorker extends CI_Controller {
      
          public function __construct()
          {
              parent::__construct();
      
              if (!is_cli()) {
                  show_error('This script can only be accessed via the command line.', 403);
              }
      
              $this->load->library('S3Uploader');
              $this->load->model('User_model');
      
              $this->sqsClient = new AwsSqsSqsClient([
                  'version' => 'latest',
                  'region'  => 'your-region',
                  'credentials' => [
                      'key'    => 'your-key',
                      'secret' => 'your-secret',
                  ],
              ]);
      
              $this->queueUrl = 'https://sqs.your-region.amazonaws.com/your-account-id/your-queue-name';
          }
      
          public function process_queue()
          {
              while (true) {
                  $result = $this->sqsClient->receiveMessage([
                      'QueueUrl'            => $this->queueUrl,
                      'MaxNumberOfMessages' => 10, // Adjust as needed
                      'WaitTimeSeconds'     => 20, // Long polling
                  ]);
      
                  if (!isset($result['Messages'])) {
                      continue;
                  }
      
                  foreach ($result['Messages'] as $message) {
                      $body = json_decode($message['Body'], true);
      
                      $userId = $body['user_id'];
                      $localFilePath = $body['local_path'];
                      $s3Key = $body['s3_key'];
      
                      $uploadSuccess = $this->s3uploader->uploadFile($localFilePath, $s3Key);
      
                      if ($uploadSuccess) {
                          $s3Path = 's3://your-bucket/' . $s3Key;
                          $this->User_model->update_user_log_path($userId, $s3Path);
      
                          // Delete the message from the queue
                          $this->sqsClient->deleteMessage([
                              'QueueUrl'      => $this->queueUrl,
                              'ReceiptHandle' => $message['ReceiptHandle'],
                          ]);
      
                          // Delete local file if needed
                          unlink($localFilePath);
                      } else {
                          // Handle failure and optionally retry
                      }
                  }
              }
          }
      }
      

      Run the worker script:

      php index.php sqsworker process_queue
      

    3. Increase Script Execution Time Temporarily (Not Recommended for Web Scripts)

    If you still prefer to stick with web scripts (though not recommended), you can increase the max_execution_time setting in your php.ini or script.

    Example:

    ini_set('max_execution_time', 0); // 0 means unlimited execution time
    

    Important: This is not recommended for web-based scripts and should be used with caution. Long-running web scripts can tie up server resources and affect other users.

    4. Optimize AWS S3 Uploads with Concurrency

    Use the AWS SDK for PHP to manage uploads efficiently. The SDK supports concurrent uploads and handles retries automatically.

    Example:

    use AwsS3S3Client;
    use AwsCommandPool;
    use AwsExceptionAwsException;
    
    
    Login or Signup to reply.
  2. load->library(‘S3Uploader’);
    $this->load->model(‘User_model’);

    $this->sqsClient = new AwsSqsSqsClient([
    ‘version’ => ‘latest’,
    ‘region’ => ‘your-region’,
    ‘credentials’ => [
    ‘key’ => ‘your-key’,
    ‘secret’ => ‘your-secret’,
    ],
    ]);

    $this->queueUrl = ‘https://sqs.your-region.amazonaws.com/your-account-id/your-queue-name’;
    }

    public function process_queue()
    {
    while (true) {
    $result = $this->sqsClient->receiveMessage([
    ‘QueueUrl’ => $this->queueUrl,
    ‘MaxNumberOfMessages’ => 10, // Adjust as needed
    ‘WaitTimeSeconds’ => 20, // Long polling
    ]);

    if (!isset($result[‘Messages’])) {
    continue;
    }

    foreach ($result[‘Messages’] as $message) {
    $body = json_decode($message[‘Body’], true);

    $userId = $body[‘user_id’];
    $localFilePath = $body[‘local_path’];
    $s3Key = $body[‘s3_key’];

    $uploadSuccess = $this->s3uploader->uploadFile($localFilePath, $s3Key);

    if ($uploadSuccess) {
    $s3Path = ‘s3://your-bucket/’ . $s3Key;
    $this->User_model->update_user_log_path($userId, $s3Path);

    // Delete the message from the queue
    $this->sqsClient->deleteMessage([
    ‘QueueUrl’ => $this->queueUrl,
    ‘ReceiptHandle’ => $message[‘ReceiptHandle’],
    ]);

    // Delete local file if needed
    unlink($localFilePath);
    } else {
    // Handle failure and optionally retry
    }
    }
    }
    }
    }

    Login or Signup to reply.
  3. 
    // Create the S3Client
    $s3Client = new S3Client([
        'version' => 'latest',
        'region'  => 'your-region',
        'credentials' => [
            'key'    => 'your-key',
            'secret' => 'your-secret',
        ],
    ]);
    
    // Prepare the commands
    $commands = [];
    foreach ($users as $user) {
        $localFilePath = '/path/to/logs/' . $user->log_file_name;
        $s3Key = 'logs/' . $user->id . '/' . $user->log_file_name;
    
        $commands[] = $s3Client->getCommand('PutObject', [
            'Bucket'     => 'your-bucket',
            'Key'        => $s3Key,
            'SourceFile' => $localFilePath,
        ]);
    }
    
    // Create a pool
    $pool = new CommandPool($s3Client, $commands, [
        'concurrency' => 5, // Adjust concurrency based on your server capabilities
        'fulfilled' => function ($result, $iterKey, $aggregatePromise) use ($users) {
            // Get user ID from $users array
            $user = $users[$iterKey];
            $s3Path = $result['ObjectURL'];
    
            // Update database with S3 path
            $this->User_model->update_user_log_path($user->id, $s3Path);
    
            // Delete local file if needed
            unlink('/path/to/logs/' . $user->log_file_name);
        },
        'rejected' => function ($reason, $iterKey, $aggregatePromise) {
            // Handle failed uploads
            echo "Upload failed for item {$iterKey}n";
        },
    ]);
    
    // Initiate the pool transfers
    $promise = $pool->promise();
    $promise->wait();
    

    Note: Adjust the concurrency level based on your server’s capacity and network bandwidth.

    5. Use Multipart Uploads for Large Files

    If you have large files (over 100 MB), use S3’s multipart upload capability to upload parts of a file in parallel.

    However, for smaller log files, this might not provide significant benefits.

    6. Monitor and Optimize Resource Usage

    Ensure that your script efficiently uses memory and handles potential memory leaks.

    • Unset Variables: After processing each file, unset variables that are no longer needed.

      unset($user);
      
    • Garbage Collection: Invoke garbage collection if necessary.

      gc_collect_cycles();
      

    7. Consider Compressing Log Files

    If appropriate, compress log files to reduce upload time and storage space.

    • Compress Before Upload:

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search