skip to Main Content

I have to insert 29000 rows in 2 DB: TimescaleDB and Influx in my local machine (Ubuntu 20.04 8GB Ram)

When I insert into influx, it is quite fast. I do it with a Golang service that chunk data into piece of 10000 rows, and insert it into influx. it takes 256ms

But to insert into TimescaleDB, it is quite different. First, I use Laravel 8 / PHP8. Maybe not the fastest, but I’m stuck with it.

Then, I insert chunks of 100 row, with upsert() function.

This is taking 22min for 29000 rows. I use linux-valet, which use nginx server, php8, and postgres / timescaleDB 12.

I tried to replicate my production settings, in my .env, I have set

APP_ENV=production
APP_DEBUG=false

And I executed php artisan optimize

It still takes 22 min.

In production, I don’t have this issue, because I only insert 5648 rows, and it takes 123 ms for inlux, and 1sec for TimescaleDB. The only difference I guess is that in production, I still run PHP7.4

For both local and production, the timescaleDB config is the default one.

I understand Laravel is slow for bulk insert and that Go is much faster for this task.

Why is there such a difference and what should I do to make it quicker ?

EDIT: Added the piece of Laravel Code to insert in TimescaleDB:

public function store(Request $request)
    {
        ini_set('max_execution_time', -1);
        $body = $request->getContent();
        $meter = json_decode($body, true);
        $chunkSize = 100;
        $measures = $meter['Measures'];
        $chunkedRaws = collect($measures)->chunk($chunkSize);
        $ok = 0;
        $nok = 0;
        foreach ($chunkedRaws as $raws) {
            $raws = $raws->map(function ($raw) use ($meter) {
                if (!isset($raw['s'])) {
                    $source = "Pamela";
                } else {
                    $source = $raw['s'];
                }
                return new AppModelsRaw([
                    'operation_id' => $meter['OperationID'],
                    'meter_id' => $meter['ID'],
                    'conso_prod' => $meter['ConsoProd'],
                    'timestep' => $meter['Timestep'],
                    'unit' => $meter['Unit'],
                    'source' => $source,
                    'time' => Carbon::parse($raw['t'])->toIso8601ZuluString(),
                    'delta' => $raw['d'],
                ]);
            })->toArray();
            try {
                DB::disableQueryLog();
                $raws = collect($raws)->unique(function ($item) { // eliminate duplicates
                    return $item['time'] . $item['operation_id'] . $item['meter_id'] . $item['conso_prod'];
                })->toArray();
                DB::table('raws')->upsert($raws, ['time', 'operation_id', 'meter_id', 'conso_prod'], ['delta']);
//                DB::table('raws')->insert($raws);
                $ok += $chunkSize;
            } catch (PDOException $e) {
                $nok += $chunkSize;
                Log::warning($e->getMessage());
            }
        }
        return response()->json(['message' => $ok . " Raws has been inserted,n " . $nok . " Failed"], 200);
    }

2

Answers


  1. My guess is that most of the performance is in your use of Laravel (and upserts), which makes the comparison very apples-to-oranges.

    If you look at bulk inserts using a standard benchmarking tool (TSBS), you’ll see that Timescale outperforms Influx, significantly when your cardinality is higher:

    Curious: why are you stuck with Laravel 8 / PHP8 for TimescaleDB, but can use a Go runner for InfluxDB?

    That said, a few suggestions:

    • Even when using php, one suggestion is that you benchmark the actual insert performance, as opposed to the overhead of php marshalling (which can be significant. One way to do that is to wrap the actual insert/SQL call in a timer, while not actually putting your ORM marshalling in that loop. That at least helps you better understand if the bottleneck is TimescaleDB vs. Laravel.

    • Try doing the same as an insert, vs. an UPSERT.

    • Less familiar with Laravel myself, but make sure you are using a DB connection pool across inserts, so the ORM isn’t establishing a new connection for each insert. I’m guess

    Login or Signup to reply.
  2. I agree with Mike that this is more apples-to-oranges for a number of reasons. From your description, it "feels" like there must be some kind of blocking going on with the transactions, really outside of TimescaleDB. I also don’t have experience with Laravel, but I’m certain TimescaleDB/Postgres isn’t your issue at this scale. I’ve recently been doing lots of ingest benchmarking with TimescaleDB using TSBS (written in Golang) and even on a 2vCPU/8GB instance, I consistently achieve 75K-85K rows/second on a table that has 10 values (columns), batched at 10,000 rows, using 24 "workers" (connections)

    I assume you’re using the laravel-upsert plugin (https://github.com/staudenmeir/laravel-upsert) which appears to convert your queries to INSERT INTO... ON CONFLICT DO UPDATE SET.... While this in and of itself isn’t a specific issue (generally), you could easily get multiple connections/transactions attempting to select and then update rows in the same page (taking exclusive locks), causing blocking, and duking it out. This can be a particular problem if you have no time order to your data and multiple batches executing at once (multiple threads/connections).

    Do you see any evidence of something like that in your Postgres logs?

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search