skip to Main Content

We are having SOLR (8.3.1) CLOUD (NRT) with Zookeeper Ensemble , 3 nodes
each on Centos VMs

SOLR Nodes has 66GB RAM, 15GB HEAP MEM, 4 CPUs.
Record Count: 3.3Million. Avg Doc Size is 350Kb.

Everything works fine until some disturbance happens with the cluser, due to load or network latancy issues. The threads in TIMED_WAITING increase to 7000+ and it stays until SOLR restart

Server 1:
7722 Threads are in TIMED_WATING
("lock":"java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@151d5f2f
")

Server 2:
4046 Threads are in TIMED_WATING
("lock":"java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@1e0205c3
")

Server 3:
4210 Threads are in TIMED_WATING
("lock":"java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@5ee792c0
")

How to increase the 3000 to something bigger? will net.ipv4.tcp_tw_reuse=1 helps? what is the drawback? Please help.

2

Answers


  1. Validate System time/NTP Sync during error window. It might be one of the root cause. Also, watch for the explicit client’s commits.

    Login or Signup to reply.
  2. One of possible workaround is switch to http1 (solr option -Dsolr.http1)

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search