I am very new to InfluxDB, Initially, I installed the 1.8 version but later upgraded to v2.0.
I am treating this as an out-of-the-box approach, for now, I was able to set up the insertion into influx using https://github.com/influxdata/influxdb-client-php Client Library for PHP and batch of 5000 with a timeout of 30 seconds.
I have created 2 buckets with a 24 hour retention period, one for 15-minute interval data and one for 60-minute interval data. This insertion rate is approx. 21 Million per hour.
No other queries are running on the server for now.
I have not taken cardinality into account yet, I was trying to go down – implement first and optimize later path and was expecting the ingestion to be running slow but not crashing.
Following is a snapshot for htop on the VM showing the resource utilization by InfluxDB. It is continuously using a lot of RAM and was killed by OOM Killer after 6 hours of runtime.
2
Answers
What’s your defined schema?
You should check out your series cardinality first to reduce the usage of resources because of your huge data insertion. InfluxDB use TSI as time series index and it will pull frequently accessed data into memory.
The series cardinality can be calculated by:
If you have unbounded tags or measurements values it will lead to runaway series cardinality. So just choose an approximate schema, or limit tags and measurements values you can improve the resources needed.
I’d recommend to consider alternatives while you’re still new to influxdb. High cardinality issue is very common in world of time series databases and influx shows mediocre results here. Looks like library you mentioned uses influx line protocol, so you may try VictoriaMetrics instead. Take a look article about ingestion or high cardinality benchmarks to see why I’m recommending to switch.