I’m running Cassandra 4.0.8 on CentOS Stream release 8, using Java 11.0.17.
When I cleanup after a bootstrap some of my nodes crash. Basically it seems to crash when streaming and compacting a lot of data.
I believe to be using the latest Java 11 VM and the latest Cassandra 4 versions.
Would anyone have any suggestions on how to fix this problem.
I cannot reproduce the problem on demand, it happens sometimes when there is lots to do.!!
Thanks for your help,
Jean
Here is an extract of the core dump:
# A fatal error has been detected by the Java Runtime Environment:
#
# SIGBUS (0x7) at pc=0x00007fe388431daa, pid=3006650, tid=3008039
#
# JRE version: Java(TM) SE Runtime Environment 18.9 (11.0.17+10) (build 11.0.17+10-LTS-269)
# Java VM: Java HotSpot(TM) 64-Bit Server VM 18.9 (11.0.17+10-LTS-269, mixed mode, tiered, compressed oops, concurrent mark sweep gc, linux-amd64)
# Problematic frame:
# v ~StubRoutines::updateBytesCRC32
#
# Core dump will be written. Default location: Core dumps may be processed with "/usr/lib/systemd/systemd-coredump %P %u %g %s %t %c %h %e" (or dumping to /home/cassandra/core.3006650)
#
# If you would like to submit a bug report, please visit:
# https://bugreport.java.com/bugreport/crash.jsp
#
--------------- S U M M A R Y ------------
Command Line: -ea -da:net.openhft... -XX:+UseThreadPriorities -XX:+HeapDumpOnOutOfMemoryError -Xss256k -XX:+AlwaysPreTouch -XX:-UseBiasedLocking -XX:+UseTLAB -XX:+ResizeTLAB -XX:+UseNUMA -XX:+PerfDisableSharedMem -Djava.net.
preferIPv4Stack=true -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSWaitDuration=10000 -XX
:+CMSParallelInitialMarkEnabled -XX:+CMSEdenChunksRecordAlways -XX:+CMSClassUnloadingEnabled -Djdk.attach.allowAttachSelf=true –add-exports=java.base/jdk.internal.misc=ALL-UNNAMED –add-exports=java.base/jdk.internal.ref=AL
L-UNNAMED –add-exports=java.base/sun.nio.ch=ALL-UNNAMED –add- exports=java.management.rmi/com.sun.jmx.remote.internal.rmi=ALL-UNNAMED –add-exports=java.rmi/sun.rmi.registry=ALL-UNNAMED –add-exports=java.rmi/sun.rmi.server
=ALL-UNNAMED –add-exports=java.sql/java.sql=ALL-UNNAMED –add-opens=java.base/java.lang.module=ALL-UNNAMED –add-opens=java.base/jdk.internal.loader=ALL-UNNAMED –add-opens=java.base/jdk.internal.ref=ALL-UNNAMED –add-opens
=java.base/jdk.internal.reflect=ALL-UNNAMED –add-opens=java.base/jdk.internal.math=ALL-UNNAMED –add-opens=java.base/jdk.internal.module=ALL-UNNAMED –add-opens=java.base/jdk.internal.util.jar=ALL-UNNAMED –add-opens=jdk.ma
nagement/com.sun.management.internal=ALL-UNNAMED -Dio.netty.tryReflectionSetAccessible=true -Xlog:gc=info,heap*=trace,age*=debug,safepoint=info,promotion*=trace:file=/home/cassandra/apache-cassandra/logs/gc.log:time,uptime,p
id,tid,level:filecount=10,filesize=10485760 -Xms7961M -Xmx7961M -Xmn400M -XX:+UseCondCardMark -XX:CompileCommandFile=/home/cassandra/apache-cassandra/conf/hotspot_compiler -javaagent:/home/cassandra/apache-cassandra/lib/jamm
-0.3.2.jar -Dcassandra.jmx.remote.port=7199 -Dcom.sun.management.jmxremote.rmi.port=7199 -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.password.file=/etc/cassandra/jmxremote.password -Djava
.library.path=/home/cassandra/apache-cassandra/lib/sigar-bin -Dcassandra.libjemalloc=/usr/lib64/libjemalloc.so.2 -XX:OnOutOfMemoryError=kill -9 %p -Dlogback.configurationFile=logback.xml -Dcassandra.logdir=/home/cassandra/ap
ache-cassandra/logs -Dcassandra.storagedir=/home/cassandra/apache-cassandra/data org.apache.cassandra.service.CassandraDaemon
Host: Intel(R) Core(TM) i5-6260U CPU @ 1.80GHz, 4 cores, 31G, CentOS Stream release 8
Time: Tue Mar 14 15:01:44 2023 CET elapsed time: 2430.690820 seconds (0d 0h 40m 30s)
--------------- T H R E A D ---------------
Current thread (0x00007f2b75aaee00): JavaThread "CompactionExecutor:8" daemon [_thread_in_Java, id=3008039, stack(0x00007f25d5ddf000,0x00007f25d5e20000)]
Stack: [0x00007f25d5ddf000,0x00007f25d5e20000], sp=0x00007f25d5e1c6f0, free space=245k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
v ~StubRoutines::updateBytesCRC32
siginfo: si_signo: 7 (SIGBUS), si_code: 2 (BUS_ADRERR), si_addr: 0x00007f4556a2d000
Register to memory mapping:
RAX=0x00000000d8464a00 is an unknown value
RBX=0x000000061f4c1c90 is an oop: java.nio.DirectByteBufferR
{0x000000061f4c1c90} - klass: 'java/nio/DirectByteBufferR'
RCX=0x00007fe3a138e9c0: <offset 0x000000000146f9c0> in /usr/jdk- 11.0.17/lib/server/libjvm.so at 0x00007fe39ff1f000
RDX=0x0000000000000082 is an unknown value
RSP=0x00007f25d5e1c6f0 is pointing into the stack for thread: 0x00007f2b75aaee00
RBP=0x00007f25d5e1c6f0 is pointing into the stack for thread: 0x00007f2b75aaee00
RSI=0x00007f4556a2cfd0 points into unknown readable memory: 0x4f1f18b0cedc8a28 | 28 8a dc ce b0 18 1f 4f
RDI=0x00000000d18d1b56 is an unknown value
R8 =0x0000000029edee07 is an unknown value
R9 =0x0000000000000a19 is an unknown value
2
Answers
@Madhur Ahuja, The way I solved it is based on all the answers I received. I bumped up the memory and enabled G1 garbage collector. Also I'm using the latest version which is today: 4.1.3. Since then, I do not have this error.
I defined:
And modified the jvm11-server.options to use the G1 garbage collectors and disabling CMS using this:
Hope it helps you.
Echoing what LHWizard said, in looking through your question, it looks like your heap size is slightly less than 8GB. That means you’re getting the default, computed size, which probably isn’t enough.
As you appear to be using the CMS garbage collector, the heap sizing (specified in the
conf/jvm-server.options
file) should look something like this:So
Xmx
is the max heap size.Xms
is the initial heap size, but as resizing the heap gives a performance hit, with Cassandra you typically want to set max and initial heap sizes the same. Also with Cassandra and CMS, you should target a new gen size of 25% to 50% of the heap, which is why I haveXmn6GB
.Note that you should explore the config options in the
jvm11-server.options
for the G1 garbage collector instead. The CMS collector was deprecated in Java 9 and removed in Java 14. If you do decide to go that route, you only need to worry aboutXmx
andXms
(you shouldn’t set the new gen size with G1GC).