skip to Main Content

Environment:

  • Hardware:

    • 8Core, 32GB as a guest OS in VMWare
    • 1TB for data in VSAN
  • Software

    • OS: CentOS 7 64bit
    • JDK version: 11.0.7
    • PostgreSQL version: 13
    • Timescaledb version: 2.3.0
    • PostgreSQL JDBC version: 42.2.18

I have already run timescaledb-tune to optimize the postgresql.conf

My data is very huge. The old table contains 832818510 records of 7 days. I followed timescaledb’s instruction of data migration

My migration steps are:

CREATE TABLE test_lp (LIKE lp_interval INCLUDING DEFAULTS INCLUDING CONSTRAINTS INCLUDING INDEXES);
SELECT create_hypertable('test_lp', 'read_time', chunk_time_interval => INTERVAL '1 hour');
INSERT INTO test_lp select * from lp_interval

The table schema is:

create table test_lp
(
    meterno           varchar(11) not null,
    reading_id        varchar(60) not null,
    read_time         timestamp   not null,
    version           bigint,
    ami_record_num    bigint,
    flags             bigint,
    fail_code         bigint,
    value             double precision,
    validation_status varchar(255),
    custno            varchar(11) not null,
    insert_date       timestamp   not null,
    constraint test_lp_pkey
        primary key (custno, meterno, reading_id, read_time)
);

alter table test_lp
    owner to appuser;

create index test_lp_read_time_idx
    on test_lp (read_time desc);

The overall average insert speed is about 50000 records/second. looks good.

Then, I started to run my simple Java program at the same machine as the timescaledb. The Java program used one connection and inserted more data into the new table test_lp. the Java program did one commit for every 1000 inserted record.

After computing the insert speed, the java’s average insert speed is only 530 records/second around.

I also tried cleaning table `test_lp’ and re-run the Java program. The program’s insert speed is still as slow as above

Why is the Java’s insert speed is so slow? Did I miss something ?

Below is my postgresql.conf. I use show all in psql

allow_system_table_mods =   off
application_name    =   PostgreSQL JDBC Driver
archive_cleanup_command =   
archive_command =   (disabled)
archive_mode    =   off
archive_timeout =   0
array_nulls =   on
authentication_timeout  =   1min
autovacuum  =   on
autovacuum_analyze_scale_factor =   0.1
autovacuum_analyze_threshold    =   50
autovacuum_freeze_max_age   =   200000000
autovacuum_max_workers  =   10
autovacuum_multixact_freeze_max_age =   400000000
autovacuum_naptime  =   10s
autovacuum_vacuum_cost_delay    =   2ms
autovacuum_vacuum_cost_limit    =   -1
autovacuum_vacuum_insert_scale_factor   =   0.2
autovacuum_vacuum_insert_threshold  =   1000
autovacuum_vacuum_scale_factor  =   0.2
autovacuum_vacuum_threshold =   50
autovacuum_work_mem =   -1
backend_flush_after =   0
backslash_quote =   safe_encoding
backtrace_functions =   
bgwriter_delay  =   200ms
bgwriter_flush_after    =   0
bgwriter_lru_maxpages   =   100
bgwriter_lru_multiplier =   2
block_size  =   8192
bonjour =   off
bonjour_name    =   
bytea_output    =   hex
check_function_bodies   =   on
checkpoint_completion_target    =   0.9
checkpoint_flush_after  =   256kB
checkpoint_timeout  =   15min
checkpoint_warning  =   30s
client_encoding =   UTF8
client_min_messages =   notice
cluster_name    =   
commit_delay    =   0
commit_siblings =   5
constraint_exclusion    =   partition
cpu_index_tuple_cost    =   0.005
cpu_operator_cost   =   0.0025
cpu_tuple_cost  =   0.01
cursor_tuple_fraction   =   0.1
data_checksums  =   off
data_directory_mode =   0700
data_sync_retry =   off
DateStyle   =   ISO, YMD
db_user_namespace   =   off
deadlock_timeout    =   1s
debug_assertions    =   off
debug_pretty_print  =   on
debug_print_parse   =   off
debug_print_plan    =   off
debug_print_rewritten   =   off
default_statistics_target   =   500
default_table_access_method =   heap
default_tablespace  =   
default_text_search_config  =   pg_catalog.simple
default_transaction_deferrable  =   off
default_transaction_isolation   =   read committed
default_transaction_read_only   =   off
dynamic_shared_memory_type  =   posix
effective_cache_size    =   22GB
effective_io_concurrency    =   200
enable_bitmapscan   =   on
enable_gathermerge  =   on
enable_hashagg  =   on
enable_hashjoin =   on
enable_incremental_sort =   on
enable_indexonlyscan    =   on
enable_indexscan    =   on
enable_material =   on
enable_mergejoin    =   on
enable_nestloop =   on
enable_parallel_append  =   on
enable_parallel_hash    =   on
enable_partition_pruning    =   on
enable_partitionwise_aggregate  =   on
enable_partitionwise_join   =   on
enable_seqscan  =   on
enable_sort =   on
enable_tidscan  =   on
escape_string_warning   =   on
event_source    =   PostgreSQL
exit_on_error   =   off
extra_float_digits  =   3
force_parallel_mode =   off
from_collapse_limit =   8
fsync   =   on
full_page_writes    =   on
geqo    =   on
geqo_effort =   5
geqo_generations    =   0
geqo_pool_size  =   0
geqo_seed   =   0
geqo_selection_bias =   2
geqo_threshold  =   12
gin_fuzzy_search_limit  =   0
gin_pending_list_limit  =   4MB
hash_mem_multiplier =   1
hot_standby =   on
hot_standby_feedback    =   off
huge_pages  =   try
idle_in_transaction_session_timeout =   0
ignore_checksum_failure =   off
ignore_invalid_pages    =   off
ignore_system_indexes   =   off
integer_datetimes   =   on
IntervalStyle   =   postgres
jit =   on
jit_above_cost  =   100000
jit_debugging_support   =   off
jit_dump_bitcode    =   off
jit_expressions =   on
jit_inline_above_cost   =   500000
jit_optimize_above_cost =   500000
jit_profiling_support   =   off
jit_tuple_deforming =   on
join_collapse_limit =   8
krb_caseins_users   =   off
lc_collate  =   zh_TW.UTF-8
lc_ctype    =   zh_TW.UTF-8
lc_messages =   zh_TW.UTF-8
lc_monetary =   zh_TW.UTF-8
lc_numeric  =   zh_TW.UTF-8
lc_time =   zh_TW.UTF-8
listen_addresses    =   *
lo_compat_privileges    =   off
local_preload_libraries =   
lock_timeout    =   0
log_autovacuum_min_duration =   -1
log_checkpoints =   off
log_connections =   off
log_destination =   stderr
log_disconnections  =   off
log_duration    =   off
log_error_verbosity =   default
log_executor_stats  =   off
log_file_mode   =   0600
log_hostname    =   off
log_line_prefix =   %m [%p]
log_lock_waits  =   off
log_min_duration_sample =   -1
log_min_duration_statement  =   -1
log_min_error_statement =   error
log_min_messages    =   warning
log_parameter_max_length    =   -1
log_parameter_max_length_on_error   =   0
log_parser_stats    =   off
log_planner_stats   =   off
log_replication_commands    =   off
log_rotation_age    =   1d
log_rotation_size   =   0
log_statement   =   none
log_statement_sample_rate   =   1
log_statement_stats =   off
log_temp_files  =   -1
log_timezone    =   Asia/Taipei
log_transaction_sample_rate =   0
log_truncate_on_rotation    =   on
logging_collector   =   on
logical_decoding_work_mem   =   64MB
maintenance_io_concurrency  =   10
maintenance_work_mem    =   420MB
max_connections =   100
max_files_per_process   =   1000
max_function_args   =   100
max_identifier_length   =   63
max_index_keys  =   32
max_locks_per_transaction   =   256
max_logical_replication_workers =   4
max_parallel_maintenance_workers    =   4
max_parallel_workers    =   8
max_parallel_workers_per_gather =   4
max_pred_locks_per_page =   2
max_pred_locks_per_relation =   -2
max_pred_locks_per_transaction  =   64
max_prepared_transactions   =   0
max_replication_slots   =   10
max_slot_wal_keep_size  =   -1
max_stack_depth =   2MB
max_standby_archive_delay   =   30s
max_standby_streaming_delay =   30s
max_sync_workers_per_subscription   =   2
max_wal_senders =   0
max_wal_size    =   32GB
max_worker_processes    =   19
min_parallel_index_scan_size    =   512kB
min_parallel_table_scan_size    =   8MB
min_wal_size    =   16GB
old_snapshot_threshold  =   -1
operator_precedence_warning =   off
parallel_leader_participation   =   on
parallel_setup_cost =   1000
parallel_tuple_cost =   0.1
password_encryption =   scram-sha-256
pg_stat_statements.max  =   5000
pg_stat_statements.save =   on
pg_stat_statements.track    =   top
pg_stat_statements.track_planning   =   off
pg_stat_statements.track_utility    =   on
plan_cache_mode =   auto
port    =   5432
post_auth_delay =   0
pre_auth_delay  =   0
primary_slot_name   =   
promote_trigger_file    =   
quote_all_identifiers   =   off
random_page_cost    =   1.1
recovery_end_command    =   
recovery_min_apply_delay    =   0
recovery_target =   
recovery_target_action  =   pause
recovery_target_inclusive   =   on
recovery_target_lsn =   
recovery_target_name    =   
recovery_target_time    =   
recovery_target_timeline    =   latest
recovery_target_xid =   
restart_after_crash =   on
restore_command =   
row_security    =   on
search_path =   public
segment_size    =   1GB
seq_page_cost   =   1
server_encoding =   UTF8
server_version  =   13.3
server_version_num  =   130003
session_replication_role    =   origin
shared_buffers  =   8GB
shared_memory_type  =   mmap
ssl =   off
ssl_ca_file =   
ssl_cert_file   =   server.crt
ssl_crl_file    =   
ssl_key_file    =   server.key
ssl_library =   OpenSSL
ssl_passphrase_command_supports_reload  =   off
ssl_prefer_server_ciphers   =   on
standard_conforming_strings =   on
statement_timeout   =   0
superuser_reserved_connections  =   3
synchronize_seqscans    =   on
synchronous_commit  =   off
synchronous_standby_names   =   
syslog_facility =   local0
syslog_ident    =   postgres
syslog_sequence_numbers =   on
syslog_split_messages   =   on
tcp_keepalives_count    =   9
tcp_keepalives_idle =   7200
tcp_keepalives_interval =   75
tcp_user_timeout    =   0
temp_buffers    =   8MB
temp_file_limit =   -1
temp_tablespaces    =   
timescaledb.disable_load    =   off
timescaledb.enable_2pc  =   on
timescaledb.enable_async_append =   on
timescaledb.enable_cagg_reorder_groupby =   on
timescaledb.enable_chunk_append =   on
timescaledb.enable_client_ddl_on_data_nodes =   off
timescaledb.enable_connection_binary_data   =   on
timescaledb.enable_constraint_aware_append  =   on
timescaledb.enable_constraint_exclusion =   on
timescaledb.enable_optimizations    =   on
timescaledb.enable_ordered_append   =   on
timescaledb.enable_parallel_chunk_append    =   on
timescaledb.enable_per_data_node_queries    =   on
timescaledb.enable_qual_propagation =   on
timescaledb.enable_remote_explain   =   off
timescaledb.enable_runtime_exclusion    =   on
timescaledb.enable_skipscan =   on
timescaledb.enable_transparent_decompression    =   on
timescaledb.last_tuned  =   2021-07-05T14:31:04+08:00
timescaledb.last_tuned_version  =   0.11.0
timescaledb.license =   timescale
timescaledb.max_background_workers  =   8
timescaledb.max_cached_chunks_per_hypertable    =   100
timescaledb.max_insert_batch_size   =   1000
timescaledb.max_open_chunks_per_insert  =   1342
timescaledb.remote_data_fetcher =   cursor
timescaledb.restoring   =   off
timescaledb.telemetry_level =   basic
timescaledb_telemetry.cloud =   
TimeZone    =   UTC
timezone_abbreviations  =   Default
trace_notify    =   off
trace_recovery_messages =   log
trace_sort  =   off
track_activities    =   on
track_activity_query_size   =   1kB
track_commit_timestamp  =   off
track_counts    =   on
track_functions =   pl
track_io_timing =   on
transaction_deferrable  =   off
transaction_isolation   =   read committed
transaction_read_only   =   off
transform_null_equals   =   off
unix_socket_group   =   
unix_socket_permissions =   0777
update_process_title    =   on
vacuum_cleanup_index_scale_factor   =   0.1
vacuum_cost_delay   =   0
vacuum_cost_limit   =   200
vacuum_cost_page_dirty  =   20
vacuum_cost_page_hit    =   1
vacuum_cost_page_miss   =   10
vacuum_defer_cleanup_age    =   0
vacuum_freeze_min_age   =   50000000
vacuum_freeze_table_age =   150000000
vacuum_multixact_freeze_min_age =   5000000
vacuum_multixact_freeze_table_age   =   150000000
wal_block_size  =   8192
wal_buffers =   16MB
wal_compression =   on
wal_consistency_checking    =   
wal_init_zero   =   on
wal_keep_size   =   0
wal_level   =   replica
wal_log_hints   =   off
wal_receiver_create_temp_slot   =   off
wal_receiver_status_interval    =   10s
wal_receiver_timeout    =   1min
wal_recycle =   on
wal_retrieve_retry_interval =   5s
wal_segment_size    =   16MB
wal_sender_timeout  =   1min
wal_skip_threshold  =   2MB
wal_sync_method =   fdatasync
wal_writer_delay    =   200ms
wal_writer_flush_after  =   1MB
work_mem    =   32MB
xmlbinary   =   base64
xmloption   =   content
zero_damaged_pages  =   off

2

Answers


  1. So, as I remember, the Java JDBC driver has a special flag you need to add to the connection string to the database in order to make it rewrite inserts so they’re multi-valued inserts rather than batched single valued inserts. That change can have a 10-20x speedup (at least) for inserts into Timescale/Postgres.

    The parameter is called rewriteBatchedInserts and you can either just add rewriteBatchedInserts=true to the connection string or there are ways some of the drivers let you add it more programatically. I think https://vladmihalcea.com/postgresql-multi-row-insert-rewritebatchedinserts-property/ has some more info.

    It’s a bit of a weird sort of thing where you almost just make an incantation and suddenly everything works…hopefully that helps!

    Login or Signup to reply.
  2. Is the data loader running in the same network? Did you consider network latency?

    I’d recommend you to try the tsbs benchmark tool in the machine to understand if the issue is with your Java client or with the machine: https://github.com/timescale/tsbs

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search