skip to Main Content

I hope I find you allright.

So, I had Varnish 6.x LTS on Centos 7 on a server with 64Gb of RAM, practically working perfectly as configured, and I updgraded my setup to a server with 128Gb of RAM and CentOS 8.
With this setup, I often get Varnish restarts due to OOM, even though the configuration is as follows:

ExecStart=/usr/sbin/varnishd 
          -a :6081 
          -f /etc/varnish/default.vcl 
          -l 200m 
          -s malloc,38G 
          -s Transient=malloc,8G 
          -p thread_pool_min=300 
          -p thread_pool_max=500 
          -p workspace_client=256k

On the old server I had a similar setup with

-s malloc, 26Gb

and it worked fine.
The Transient part of the config parameter is only my attempt a few days back, to limit the transient cache and see if it helps, but no luck with that, in a sence that it still give indications that it will soon go OOM.
I read a lot of articles about setting this up, and also I count on ~50% or even more overhead, but in this case, it just seems to go up without any limit.
I could really use a 4-eye method on what I’m missing here, or at least a point where to go on.

My current varnishstat:

MGT.uptime              269509         1.00 Management process uptime
MGT.child_start              1         0.00 Child process started
MGT.child_exit               0         0.00 Child process normal exit
MGT.child_stop               0         0.00 Child process unexpected exit
MGT.child_died               0         0.00 Child process died (signal)
MGT.child_dump               0         0.00 Child process core dumped
MGT.child_panic              0         0.00 Child process panic
MAIN.summs             7869381        29.20 stat summ operations
MAIN.uptime             269510         1.00 Child process uptime
MAIN.sess_conn          184459         0.68 Sessions accepted
MAIN.sess_drop               0         0.00 Sessions dropped
MAIN.sess_fail               0         0.00 Session accept failures
MAIN.sess_fail_econnaborted            0         0.00 Session accept failures: connection aborted
MAIN.sess_fail_eintr                   0         0.00 Session accept failures: interrupted system call
MAIN.sess_fail_emfile                  0         0.00 Session accept failures: too many open files
MAIN.sess_fail_ebadf                   0         0.00 Session accept failures: bad file descriptor
MAIN.sess_fail_enomem                  0         0.00 Session accept failures: not enough memory
MAIN.sess_fail_other                   0         0.00 Session accept failures: other
MAIN.client_req_400                    0         0.00 Client requests received, subject to 400 errors
MAIN.client_req_417                    0         0.00 Client requests received, subject to 417 errors
MAIN.client_req                  2373462         8.81 Good client requests received
MAIN.cache_hit                    500010         1.86 Cache hits
MAIN.cache_hit_grace              106009         0.39 Cache grace hits
MAIN.cache_hitpass                     0         0.00 Cache hits for pass.
MAIN.cache_hitmiss                     0         0.00 Cache hits for miss.
MAIN.cache_miss                  1611440         5.98 Cache misses
MAIN.backend_conn                  12617         0.05 Backend conn. success
MAIN.backend_unhealthy                 0         0.00 Backend conn. not attempted
MAIN.backend_busy                      0         0.00 Backend conn. too many
MAIN.backend_fail                      0         0.00 Backend conn. failures
MAIN.backend_reuse               1966574         7.30 Backend conn. reuses
MAIN.backend_recycle             1976920         7.34 Backend conn. recycles
MAIN.backend_retry                     1         0.00 Backend conn. retry
MAIN.fetch_head                        0         0.00 Fetch no body (HEAD)
MAIN.fetch_length                 475423         1.76 Fetch with Length
MAIN.fetch_chunked               1497899         5.56 Fetch chunked
MAIN.fetch_eof                         0         0.00 Fetch EOF
MAIN.fetch_bad                         0         0.00 Fetch bad T-E
MAIN.fetch_none                     1325         0.00 Fetch no body
MAIN.fetch_1xx                         0         0.00 Fetch no body (1xx)
MAIN.fetch_204                         0         0.00 Fetch no body (204)
MAIN.fetch_304                      3073         0.01 Fetch no body (304)
MAIN.fetch_failed                   1462         0.01 Fetch failed (all causes)
MAIN.fetch_no_thread                   0         0.00 Fetch failed (no thread)
MAIN.pools                             2          .   Number of thread pools
MAIN.threads                         600          .   Total number of threads
MAIN.threads_limited                   0         0.00 Threads hit max
MAIN.threads_created                 923         0.00 Threads created
MAIN.threads_destroyed               323         0.00 Threads destroyed
MAIN.threads_failed                    0         0.00 Thread creation failed
MAIN.thread_queue_len                  0          .   Length of session queue
MAIN.busy_sleep                      878         0.00 Number of requests sent to sleep on busy objhdr
MAIN.busy_wakeup                     878         0.00 Number of requests woken after sleep on busy objhdr
MAIN.busy_killed                       0         0.00 Number of requests killed after sleep on busy objhdr
MAIN.sess_queued                     134         0.00 Sessions queued for thread
MAIN.sess_dropped                      0         0.00 Sessions dropped for thread
MAIN.req_dropped                       0         0.00 Requests dropped
MAIN.n_object                     218508          .   object structs made
MAIN.n_vampireobject                   0          .   unresurrected objects
MAIN.n_objectcore                 218665          .   objectcore structs made
MAIN.n_objecthead                 218773          .   objecthead structs made
MAIN.n_backend                         1          .   Number of backends
MAIN.n_expired                   1392690         5.17 Number of expired objects
MAIN.n_lru_nuked                       0         0.00 Number of LRU nuked objects
MAIN.n_lru_moved                  449086         1.67 Number of LRU moved objects
MAIN.n_lru_limited                     0         0.00 Reached nuke_limit
MAIN.losthdr                           0         0.00 HTTP header overflows
MAIN.s_sess                       184459         0.68 Total sessions seen
MAIN.s_pipe                            0         0.00 Total pipe sessions seen
MAIN.s_pass                       262012         0.97 Total pass-ed requests seen
MAIN.s_fetch                     1873452         6.95 Total backend fetches initiated
MAIN.s_synth                         490         0.00 Total synthetic responses made
MAIN.s_req_hdrbytes           1975945691      7331.62 Request header bytes
MAIN.s_req_bodybytes             4279860        15.88 Request body bytes
MAIN.s_resp_hdrbytes          1272508062      4721.56 Response header bytes
MAIN.s_resp_bodybytes       1536158038935   5699818.33 Response body bytes
MAIN.s_pipe_hdrbytes                   0         0.00 Pipe request header bytes
MAIN.s_pipe_in                         0         0.00 Piped bytes from client
MAIN.s_pipe_out                        0         0.00 Piped bytes to client
MAIN.sess_closed                     356         0.00 Session Closed
MAIN.sess_closed_err               86861         0.32 Session Closed with error
MAIN.sess_readahead                    0         0.00 Session Read Ahead
MAIN.sess_herd                   1856881         6.89 Session herd
MAIN.sc_rem_close                  97588         0.36 Session OK  REM_CLOSE
MAIN.sc_req_close                      0         0.00 Session OK  REQ_CLOSE
MAIN.sc_req_http10                     0         0.00 Session Err REQ_HTTP10
MAIN.sc_rx_bad                         0         0.00 Session Err RX_BAD
MAIN.sc_rx_body                        0         0.00 Session Err RX_BODY
MAIN.sc_rx_junk                        0         0.00 Session Err RX_JUNK
MAIN.sc_rx_overflow                    0         0.00 Session Err RX_OVERFLOW
MAIN.sc_rx_timeout                 86861         0.32 Session Err RX_TIMEOUT
MAIN.sc_tx_pipe                        0         0.00 Session OK  TX_PIPE
MAIN.sc_tx_error                       0         0.00 Session Err TX_ERROR
MAIN.sc_tx_eof                         0         0.00 Session OK  TX_EOF
MAIN.sc_resp_close                     0         0.00 Session OK  RESP_CLOSE
MAIN.sc_overload                       0         0.00 Session Err OVERLOAD
MAIN.sc_pipe_overflow                  0         0.00 Session Err PIPE_OVERFLOW
MAIN.sc_range_short                    0         0.00 Session Err RANGE_SHORT
MAIN.sc_req_http20                     0         0.00 Session Err REQ_HTTP20
MAIN.sc_vcl_failure                    0         0.00 Session Err VCL_FAILURE
MAIN.client_resp_500                   0         0.00 Delivery failed due to insufficient workspace.
MAIN.ws_backend_overflow               0         0.00 workspace_backend overflows
MAIN.ws_client_overflow                0         0.00 workspace_client overflows
MAIN.ws_thread_overflow                0         0.00 workspace_thread overflows
MAIN.ws_session_overflow               0         0.00 workspace_session overflows
MAIN.shm_records               317750020      1178.99 SHM records
MAIN.shm_writes                 11262489        41.79 SHM writes
MAIN.shm_flushes                  393170         1.46 SHM flushes due to overflow
MAIN.shm_cont                     189298         0.70 SHM MTX contention
MAIN.shm_cycles                       60         0.00 SHM cycles through buffer
MAIN.backend_req                 1979183         7.34 Backend requests made
MAIN.n_vcl                             1          .   Number of loaded VCLs in total
MAIN.n_vcl_avail                       1          .   Number of VCLs available
MAIN.n_vcl_discard                     0          .   Number of discarded VCLs
MAIN.vcl_fail                          0         0.00 VCL failures
MAIN.bans                              1          .   Count of bans
MAIN.bans_completed                    1          .   Number of bans marked 'completed'
MAIN.bans_obj                          0          .   Number of bans using obj.*
MAIN.bans_req                          0          .   Number of bans using req.*
MAIN.bans_added                        1         0.00 Bans added
MAIN.bans_deleted                      0         0.00 Bans deleted
MAIN.bans_tested                       0         0.00 Bans tested against objects (lookup)
MAIN.bans_obj_killed                   0         0.00 Objects killed by bans (lookup)
MAIN.bans_lurker_tested                0         0.00 Bans tested against objects (lurker)
MAIN.bans_tests_tested                 0         0.00 Ban tests tested against objects (lookup)
MAIN.bans_lurker_tests_tested            0         0.00 Ban tests tested against objects (lurker)
MAIN.bans_lurker_obj_killed              0         0.00 Objects killed by bans (lurker)
MAIN.bans_lurker_obj_killed_cutoff            0         0.00 Objects killed by bans for cutoff (lurker)
MAIN.bans_dups                                0         0.00 Bans superseded by other bans
MAIN.bans_lurker_contention                   0         0.00 Lurker gave way for lookup
MAIN.bans_persisted_bytes                    16          .   Bytes used by the persisted ban lists
MAIN.bans_persisted_fragmentation             0          .   Extra bytes in persisted ban lists due to fragmentation
MAIN.n_purges                                 0         0.00 Number of purge operations executed
MAIN.n_obj_purged                             0         0.00 Number of purged objects
MAIN.exp_mailed                         1822034         6.76 Number of objects mailed to expiry thread
MAIN.exp_received                       1822034         6.76 Number of objects received by expiry thread
MAIN.hcb_nolock                         2111450         7.83 HCB Lookups without lock
MAIN.hcb_lock                           1599004         5.93 HCB Lookups with lock
MAIN.hcb_insert                         1599003         5.93 HCB Inserts
MAIN.esi_errors                               0         0.00 ESI parse errors (unlock)
MAIN.esi_warnings                             0         0.00 ESI parse warnings (unlock)
MAIN.vmods                                    2          .   Loaded VMODs
MAIN.n_gzip                                   0         0.00 Gzip operations
MAIN.n_gunzip                           1963548         7.29 Gunzip operations
MAIN.n_test_gunzip                      1604680         5.95 Test gunzip operations
LCK.backend.creat                             2         0.00 Created locks
LCK.backend.destroy                           0         0.00 Destroyed locks
LCK.backend.locks                       3958375        14.69 Lock Operations
LCK.ban.creat                                 1         0.00 Created locks
LCK.ban.destroy                               0         0.00 Destroyed locks
LCK.ban.locks                           3507518        13.01 Lock Operations
LCK.busyobj.creat                       1998678         7.42 Created locks
LCK.busyobj.destroy                     1998580         7.42 Destroyed locks
LCK.busyobj.locks                     139797131       518.71 Lock Operations
LCK.cli.creat                                 1         0.00 Created locks
LCK.cli.destroy                               0         0.00 Destroyed locks
LCK.cli.locks                             99103         0.37 Lock Operations
LCK.exp.creat                                 1         0.00 Created locks
LCK.exp.destroy                               0         0.00 Destroyed locks
LCK.exp.locks                          11454148        42.50 Lock Operations
LCK.hcb.creat                                 1         0.00 Created locks
LCK.hcb.destroy                               0         0.00 Destroyed locks
LCK.hcb.locks                           2982034        11.06 Lock Operations
LCK.lru.creat                                 2         0.00 Created locks
LCK.lru.destroy                               0         0.00 Destroyed locks
LCK.lru.locks                           3663810        13.59 Lock Operations
LCK.mempool.creat                             5         0.00 Created locks
LCK.mempool.destroy                           0         0.00 Destroyed locks
LCK.mempool.locks                      10186086        37.79 Lock Operations
LCK.objhdr.creat                        1599316         5.93 Created locks
LCK.objhdr.destroy                      1380547         5.12 Destroyed locks
LCK.objhdr.locks                       27912994       103.57 Lock Operations
LCK.pipestat.creat                            1         0.00 Created locks
LCK.pipestat.destroy                          0         0.00 Destroyed locks
LCK.pipestat.locks                            0         0.00 Lock Operations
LCK.sess.creat                           184447         0.68 Created locks
LCK.sess.destroy                         184449         0.68 Destroyed locks
LCK.sess.locks                          4161689        15.44 Lock Operations
LCK.tcp_pool.creat                            2         0.00 Created locks
LCK.tcp_pool.destroy                          0         0.00 Destroyed locks
LCK.tcp_pool.locks                      7901859        29.32 Lock Operations
LCK.vbe.creat                                 1         0.00 Created locks
LCK.vbe.destroy                               0         0.00 Destroyed locks
LCK.vbe.locks                             99094         0.37 Lock Operations
LCK.vcapace.creat                             1         0.00 Created locks
LCK.vcapace.destroy                           0         0.00 Destroyed locks
LCK.vcapace.locks                             0         0.00 Lock Operations
LCK.vcl.creat                                 1         0.00 Created locks
LCK.vcl.destroy                               0         0.00 Destroyed locks
LCK.vcl.locks                           4010919        14.88 Lock Operations
LCK.vxid.creat                                1         0.00 Created locks
LCK.vxid.destroy                              0         0.00 Destroyed locks
LCK.vxid.locks                              768         0.00 Lock Operations
LCK.waiter.creat                              2         0.00 Created locks
LCK.waiter.destroy                            0         0.00 Destroyed locks
LCK.waiter.locks                       11657249        43.25 Lock Operations
LCK.wq.creat                                  3         0.00 Created locks
LCK.wq.destroy                                0         0.00 Destroyed locks
LCK.wq.locks                           12078010        44.81 Lock Operations
LCK.wstat.creat                               1         0.00 Created locks
LCK.wstat.destroy                             0         0.00 Destroyed locks
LCK.wstat.locks                         7148985        26.53 Lock Operations
MEMPOOL.busyobj.live                          8          .   In use
MEMPOOL.busyobj.pool                         10          .   In Pool
MEMPOOL.busyobj.sz_wanted                 65536          .   Size requested
MEMPOOL.busyobj.sz_actual                 65504          .   Size allocated
MEMPOOL.busyobj.allocs                  1979190         7.34 Allocations
MEMPOOL.busyobj.frees                   1979182         7.34 Frees
MEMPOOL.busyobj.recycle                 1976206         7.33 Recycled from pool
MEMPOOL.busyobj.timeout                  176828         0.66 Timed out from pool
MEMPOOL.busyobj.toosmall                      0         0.00 Too small to recycle
MEMPOOL.busyobj.surplus                     137         0.00 Too many for pool
MEMPOOL.busyobj.randry                     2984         0.01 Pool ran dry
MEMPOOL.req0.live                             4          .   In use
MEMPOOL.req0.pool                            11          .   In Pool
MEMPOOL.req0.sz_wanted                   262144          .   Size requested
MEMPOOL.req0.sz_actual                   262112          .   Size allocated
MEMPOOL.req0.allocs                      979824         3.64 Allocations
MEMPOOL.req0.frees                       979820         3.64 Frees
MEMPOOL.req0.recycle                     979057         3.63 Recycled from pool
MEMPOOL.req0.timeout                     147071         0.55 Timed out from pool
MEMPOOL.req0.toosmall                         0         0.00 Too small to recycle
MEMPOOL.req0.surplus                          0         0.00 Too many for pool
MEMPOOL.req0.randry                         767         0.00 Pool ran dry
MEMPOOL.sess0.live                            6          .   In use
MEMPOOL.sess0.pool                           10          .   In Pool
MEMPOOL.sess0.sz_wanted                     512          .   Size requested
MEMPOOL.sess0.sz_actual                     480          .   Size allocated
MEMPOOL.sess0.allocs                      92389         0.34 Allocations
MEMPOOL.sess0.frees                       92383         0.34 Frees
MEMPOOL.sess0.recycle                     91719         0.34 Recycled from pool
MEMPOOL.sess0.timeout                     65581         0.24 Timed out from pool
MEMPOOL.sess0.toosmall                        0         0.00 Too small to recycle
MEMPOOL.sess0.surplus                         0         0.00 Too many for pool
MEMPOOL.sess0.randry                        670         0.00 Pool ran dry
LCK.sma.creat                                 2         0.00 Created locks
LCK.sma.destroy                               0         0.00 Destroyed locks
LCK.sma.locks                          36735852       136.31 Lock Operations
SMA.s0.c_req                           16689946        61.93 Allocator requests
SMA.s0.c_fail                                 0         0.00 Allocator failures
SMA.s0.c_bytes                     231951766672    860642.52 Bytes allocated
SMA.s0.c_freed                     203375575362    754612.35 Bytes freed
SMA.s0.g_alloc                          2052078          .   Allocations outstanding
SMA.s0.g_bytes                      28576191310          .   Bytes outstanding
SMA.s0.g_space                      12225998002          .   Bytes available
SMA.Transient.c_req                     2704021        10.03 Allocator requests
SMA.Transient.c_fail                          0         0.00 Allocator failures
SMA.Transient.c_bytes               75855969615    281458.83 Bytes allocated
SMA.Transient.c_freed               75855969615    281458.83 Bytes freed
SMA.Transient.g_alloc                         0          .   Allocations outstanding
SMA.Transient.g_bytes                         0          .   Bytes outstanding
SMA.Transient.g_space                8589934592          .   Bytes available
MEMPOOL.req1.live                             5          .   In use
MEMPOOL.req1.pool                             9          .   In Pool
MEMPOOL.req1.sz_wanted                   262144          .   Size requested
MEMPOOL.req1.sz_actual                   262112          .   Size allocated
MEMPOOL.req1.allocs                      974654         3.62 Allocations
MEMPOOL.req1.frees                       974649         3.62 Frees
MEMPOOL.req1.recycle                     973872         3.61 Recycled from pool
MEMPOOL.req1.timeout                     147049         0.55 Timed out from pool
MEMPOOL.req1.toosmall                         0         0.00 Too small to recycle
MEMPOOL.req1.surplus                          0         0.00 Too many for pool
MEMPOOL.req1.randry                         782         0.00 Pool ran dry
MEMPOOL.sess1.live                            5          .   In use
MEMPOOL.sess1.pool                            9          .   In Pool
MEMPOOL.sess1.sz_wanted                     512          .   Size requested
MEMPOOL.sess1.sz_actual                     480          .   Size allocated
MEMPOOL.sess1.allocs                      92071         0.34 Allocations
MEMPOOL.sess1.frees                       92066         0.34 Frees
MEMPOOL.sess1.recycle                     91430         0.34 Recycled from pool
MEMPOOL.sess1.timeout                     65371         0.24 Timed out from pool
MEMPOOL.sess1.toosmall                        0         0.00 Too small to recycle
MEMPOOL.sess1.surplus                         0         0.00 Too many for pool
MEMPOOL.sess1.randry                        641         0.00 Pool ran dry
VBE.boot.lcn.happy                            0          .   Happy health probes
VBE.boot.lcn.bereq_hdrbytes          1817898758      6745.20 Request header bytes
VBE.boot.lcn.bereq_bodybytes            4279860        15.88 Request body bytes
VBE.boot.lcn.beresp_hdrbytes          948671121      3519.98 Response header bytes
VBE.boot.lcn.beresp_bodybytes      281279816120   1043671.17 Response body bytes
VBE.boot.lcn.pipe_hdrbytes                    0         0.00 Pipe request header bytes
VBE.boot.lcn.pipe_out                         0         0.00 Piped bytes to backend
VBE.boot.lcn.pipe_in                          0         0.00 Piped bytes from backend
VBE.boot.lcn.conn                             8          .   Concurrent connections to backend
VBE.boot.lcn.req                        1979191         7.34 Backend requests sent
VBE.boot.lcn.unhealthy                        0         0.00 Fetches not attempted due to backend being unhealthy
VBE.boot.lcn.busy                             0         0.00 Fetches not attempted due to backend being busy
VBE.boot.lcn.fail                             0         0.00 Connections failed
VBE.boot.lcn.fail_eacces                      0         0.00 Connections failed with EACCES or EPERM
VBE.boot.lcn.fail_eaddrnotavail               0         0.00 Connections failed with EADDRNOTAVAIL
VBE.boot.lcn.fail_econnrefused                0         0.00 Connections failed with ECONNREFUSED
VBE.boot.lcn.fail_enetunreach                 0         0.00 Connections failed with ENETUNREACH
VBE.boot.lcn.fail_etimedout                   0         0.00 Connections failed ETIMEDOUT
VBE.boot.lcn.fail_other                       0         0.00 Connections failed for other reason
VBE.boot.lcn.helddown                         0         0.00 Connection opens not attempted

Jemalloc is jemalloc-5.2.1-2.el8.x86_64

Initially I have installed Varnish from CentOS 8’s AppStream repo, later I changed it to Varnish community one but no luck there either.
I have to mention that during Varnish installation, it didn’t install any jemalloc package as I could detect. I installed the upper one manually.

Besides myself, I’m starting to suspect that this setup could be buggy with CentOS 8.

2

Answers


  1. Chosen as BEST ANSWER

    Finally, it seems that jemalloc 5.2 (used on CentOS 8) still has some memory leaks, or at least it has a signifficantly larger memory foorprint that the version 3.6 that I was using on Centos 7.

    Among the most, this post explains it quite nicely: https://github.com/jemalloc/jemalloc/issues/1152

    Apparently there was some bug fix regarding this in v5.2.1 https://github.com/jemalloc/jemalloc/releases

    but it doesn't seem to be the case in Linux, or at least there's still quite a huge difference compared to v3.6x.

    So, what I did is that I precompiled jemalloc v4.4 on Centos 8, after installing Varnish from it's on community repository, and it seems to be OK so far.

    I should not that one that comes with Centos 8 Varnish module (from AppStream), seems to be missing jemalloc completely, and is ignoring the set storage limits with malloc.


  2. I had a similar problem as yours, but with more constrained memory requirements: A server with 4GB of RAM, with Varnish set up to use 2GB (-s malloc,2000m)

    On the old server, it was on Debian stretch, and I upgraded to a new server with Debian Bullseye, using varnish 6.5.1 and libjemalloc2 5.2.1. While on the old server Varnish could run for more than one year without any memory problems, on the new server memory was under high pressure after just 5 days, where Varnish was eating 3.5GB of RAM, until the oom_killer decided to kill it.

    What I’ve read is that the problem may be memory fragmentation.

    I’ve tuned the configuration of jemalloc for Varnish and things seem to have improved a lot. After 5 days, Varnish memory usage is in 2.6 GB, and it’s stable since 3 days ago.

    What I’ve done is to set a MALLOC_CONF environment variable for the varnish process with the following settings:

    thp:never,narenas:2,dirty_decay_ms:5000,muzzy_decay_ms:5000

    The dirty/muzzy_decay_ms probably don’t affect much the result here, but disabling Transparent Huge Pages and reducing the number of arenas to match the number of CPUs seem to hit the nail. Some hints can be read on the official TUNING file.

    To add that environment variable to Varnish, I added it to the Systemd unit file, with sudo systemctl edit varnish, and adding:

    [Service]
    Environment="MALLOC_CONF=thp:never,narenas:2"
    

    This may be an alternative worth exploring if you don’t want to downgrade jemalloc.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search