I have a Java application providing online video play service. When peak traffic arrives (for example, users click the push window in mobile phone), the GC overhead is very heavy, I can see info about “promotion failed” and “concurrent mode failure” in GC log, and the GC is very time consuming, almost takes 10 min to finish one GC cycle, which makes the application cannot response for a long time.
I know the problem should be the bad code design for memory usage, can anybody help to figure out how the resolve the GC problem? thanks.
Here is my server environment:
cat /etc/redhat-release
CentOS release 6.8 (Final)
free -h
total used free shared buffers cached
Mem: 31G 31G 302M 16K 260M 6.5G
-/+ buffers/cache: 24G 7.1G
Swap: 2.0G 564M 1.4G
java -version
java version "1.7.0_131"
OpenJDK Runtime Environment (rhel-2.6.9.0.el6_8-x86_64 u131-b00)
OpenJDK 64-Bit Server VM (build 24.131-b00, mixed mode)
Here is my original JVM settings:
55634 XmlConfiguration -Xmx22528m -Xms22528m -verbose:gc -XX:NewRatio=3
-XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+CMSParallelRemarkEnabled
-Xloggc:/data/logs/jetty/gc.log -XX:GCLogFileSize=10M -XX:NumberOfGCLogFiles=10
-XX:+UseGCLogFileRotation -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps
-XX:+PrintGCDetails -XX:+DisableExplicitGC -XX:+ScavengeBeforeFullGC
-XX:+CMSScavengeBeforeRemark -XX:+UseCMSInitiatingOccupancyOnly
-XX:CMSInitiatingOccupancyFraction=70
-Djetty.home=/usr/local/jetty-distribution-8.1.16.v20140903
I tried some solution but the problem still there.
1) Adjust the JVM parameters:
-XX:CMSInitiatingOccupancyFraction=50, to make CMS GC earlier to provide more free memory to application thread.
2) Adjust the JVM parameters: -XX:NewRatio=2, to make young generation bigger.
3) Adjust the JVM parameters: -XX:PermSize=512m, -XX:MaxPermSize=512m, to make persistence generation bigger.
4) Set rate limiter to single server, it can protect the underline application.
Here is the GC log:
2019-08-17T11:03:01.119+0800: 48511.609: [GC2019-08-17T11:03:01.119+0800: 48511.610: [ParNew: 5132248K->576704K(5190464K), 0.5585610 secs] 13334008K->9118076K(22491968K), 0.5594890 secs] [Times: user=3.56
sys=0.01, real=0.56 secs]
2019-08-17T11:03:03.773+0800: 48514.264: [GC2019-08-17T11:03:03.774+0800: 48514.264: [ParNew: 5190464K->576704K(5190464K), 0.6915700 secs] 13731836K->9578991K(22491968K), 0.6925000 secs] [Times: user=4.10
sys=0.01, real=0.69 secs]
2019-08-17T11:03:07.237+0800: 48517.728: [GC2019-08-17T11:03:07.238+0800: 48517.728: [ParNew: 5190464K->396073K(5190464K), 0.5246610 secs] 14192751K->9710333K(22491968K), 0.5255170 secs] [Times: user=2.92
sys=0.02, real=0.53 secs]
2019-08-17T11:03:10.301+0800: 48520.791: [GC2019-08-17T11:03:10.301+0800: 48520.791: [ParNew: 5009786K->576704K(5190464K), 0.2799780 secs] 14324046K->9906749K(22491968K), 0.2808270 secs] [Times: user=2.09
sys=0.01, real=0.28 secs]
2019-08-17T11:03:12.171+0800: 48522.662: [GC2019-08-17T11:03:12.172+0800: 48522.662: [ParNew: 5190464K->517429K(5190464K), 0.7336400 secs] 14520509K->10324876K(22491968K), 0.7346510 secs] [Times: user=4.0
5 sys=0.03, real=0.74 secs]
2019-08-17T11:03:14.893+0800: 48525.383: [GC2019-08-17T11:03:14.893+0800: 48525.384: [ParNew: 5131189K->504298K(5190464K), 0.2094840 secs] 14938636K->10311745K(22491968K), 0.2103470 secs] [Times: user=1.6
4 sys=0.00, real=0.21 secs]
2019-08-17T11:03:18.874+0800: 48529.365: [GC2019-08-17T11:03:18.875+0800: 48529.365: [ParNew: 5117791K->576704K(5190464K), 0.6094130 secs] 14925238K->10624093K(22491968K), 0.6103270 secs] [Times: user=4.0
6 sys=0.01, real=0.61 secs]
2019-08-17T11:03:21.653+0800: 48532.143: [GC2019-08-17T11:03:21.653+0800: 48532.144: [ParNew: 5190439K->576704K(5190464K), 0.7770410 secs] 15237828K->11148654K(22491968K), 0.7779820 secs] [Times: user=4.1
4 sys=0.03, real=0.78 secs]
2019-08-17T11:03:24.691+0800: 48535.182: [GC2019-08-17T11:03:24.692+0800: 48535.182: [ParNew: 5190464K->576704K(5190464K), 0.6323050 secs] 15762414K->11476700K(22491968K), 0.6331560 secs] [Times: user=4.2
2 sys=0.06, real=0.64 secs]
2019-08-17T11:03:27.652+0800: 48538.142: [GC2019-08-17T11:03:27.652+0800: 48538.142: [ParNew: 5183665K->571966K(5190464K), 0.7238850 secs] 16083661K->11972084K(22491968K), 0.7247810 secs] [Times: user=4.1
7 sys=0.12, real=0.73 secs]
2019-08-17T11:03:30.631+0800: 48541.121: [GC2019-08-17T11:03:30.631+0800: 48541.121: [ParNew: 5185726K->576704K(5190464K), 0.6396980 secs] 16585844K->12346323K(22491968K), 0.6405600 secs] [Times: user=3.6
8 sys=0.09, real=0.64 secs]
2019-08-17T11:03:33.075+0800: 48543.566: [GC2019-08-17T11:03:33.076+0800: 48543.566: [ParNew: 5190464K->576704K(5190464K), 1.5363980 secs] 16960083K->12790823K(22491968K), 1.5372590 secs] [Times: user=4.0
7 sys=0.75, real=1.53 secs]
2019-08-17T11:03:34.619+0800: 48545.109: [GC [1 CMS-initial-mark: 12214119K(17301504K)] 12795454K(22491968K), 0.6847160 secs] [Times: user=0.68 sys=0.00, real=0.68 secs]
2019-08-17T11:03:35.305+0800: 48545.796: [CMS-concurrent-mark-start]
2019-08-17T11:03:37.124+0800: 48547.614: [GC2019-08-17T11:03:37.124+0800: 48547.614: [ParNew: 5190464K->576704K(5190464K), 2.1277390 secs] 17404583K->13245164K(22491968K), 2.1285870 secs] [Times: user=5.0
0 sys=1.60, real=2.13 secs]
2019-08-17T11:03:40.637+0800: 48551.128: [GC2019-08-17T11:03:40.638+0800: 48551.128: [ParNew: 5190464K->576704K(5190464K), 2.2917500 secs] 17858924K->13825008K(22491968K), 2.2926540 secs] [Times: user=5.9
6 sys=1.64, real=2.29 secs]
2019-08-17T11:03:44.240+0800: 48554.730: [GC2019-08-17T11:03:44.240+0800: 48554.731: [ParNew: 5190464K->576704K(5190464K), 1.5847800 secs] 18438768K->14442978K(22491968K), 1.5857620 secs] [Times: user=5.8
3 sys=0.96, real=1.58 secs]
2019-08-17T11:03:47.129+0800: 48557.619: [GC2019-08-17T11:03:47.129+0800: 48557.620: [ParNew: 5190464K->576704K(5190464K), 2.6088650 secs] 19056738K->14989342K(22491968K), 2.6097680 secs] [Times: user=5.5
9 sys=2.00, real=2.61 secs]
2019-08-17T11:03:51.153+0800: 48561.643: [GC2019-08-17T11:03:51.153+0800: 48561.644: [ParNew: 5190464K->576704K(5190464K), 3.4004900 secs] 19603102K->15553978K(22491968K), 3.4015160 secs] [Times: user=7.1
7 sys=2.46, real=3.40 secs]
2019-08-17T11:03:55.936+0800: 48566.426: [GC2019-08-17T11:03:55.936+0800: 48566.426: [ParNew: 5190464K->576704K(5190464K), 1.9874080 secs] 20167738K->16393777K(22491968K), 1.9883410 secs] [Times: user=7.0
7 sys=1.27, real=1.99 secs]
2019-08-17T11:03:59.169+0800: 48569.659: [GC2019-08-17T11:03:59.169+0800: 48569.660: [ParNew (promotion failed): 5190464K->5190464K(5190464K), 151.9715540 secs]2019-08-17T11:06:31.141+0800: 4
8721.631: [CMS2019-08-17T11:06:41.022+0800: 48731.513: [CMS-concurrent-mark: 19.582/185.717 secs] [Times: user=322.79 sys=60.12, real=185.68 secs]
(concurrent mode failure): 16268450K->5994133K(17301504K), 39.5874300 secs] 21007537K->5994133K(22491968K), [CMS Perm : 106999K->106688K(169984K)], 191.5598280 secs] [Times: user=242.58 sys=45.41, real=1
91.53 secs]
2019-08-17T11:07:12.062+0800: 48762.552: [GC2019-08-17T11:07:12.062+0800: 48762.552: [ParNew: 4613760K->576704K(5190464K), 1.3077710 secs] 10607893K->7468992K(22491968K), 1.3085280 secs] [Times: user=9.17
sys=0.02, real=1.31 secs]
2019-08-17T11:07:14.615+0800: 48765.105: [GC2019-08-17T11:07:14.615+0800: 48765.105: [ParNew: 5190464K->576704K(5190464K), 0.8007980 secs] 12082752K->8128668K(22491968K), 0.8016420 secs] [Times: user=5.38
sys=0.00, real=0.80 secs]
2019-08-17T11:07:16.460+0800: 48766.950: [GC2019-08-17T11:07:16.460+0800: 48766.950: [ParNew: 5190464K->576704K(5190464K), 1.4912750 secs] 12742428K->9224109K(22491968K), 1.4921230 secs] [Times: user=10.0
3 sys=0.00, real=1.49 secs]
2019-08-17T11:07:19.175+0800: 48769.665: [GC2019-08-17T11:07:19.175+0800: 48769.666: [ParNew: 5190464K->576704K(5190464K), 0.8212860 secs] 13837869K->9797270K(22491968K), 0.8221130 secs] [Times: user=5.63
sys=0.00, real=0.82 secs]
2019-08-17T11:07:21.087+0800: 48771.578: [GC2019-08-17T11:07:21.088+0800: 48771.578: [ParNew: 5190464K->576704K(5190464K), 0.8776180 secs] 14411030K->10567781K(22491968K), 0.8785350 secs] [Times: user=5.7
3 sys=0.01, real=0.88 secs]
2019-08-17T11:07:23.122+0800: 48773.613: [GC2019-08-17T11:07:23.123+0800: 48773.613: [ParNew: 5190464K->576704K(5190464K), 0.8694820 secs] 15181541K->11179123K(22491968K), 0.8703220 secs] [Times: user=5.7
7 sys=0.01, real=0.88 secs]
2019-08-17T11:07:25.040+0800: 48775.531: [GC2019-08-17T11:07:25.041+0800: 48775.531: [ParNew: 5190464K->576704K(5190464K), 0.5756330 secs] 15792883K->11513169K(22491968K), 0.5764910 secs] [Times: user=3.8
1 sys=0.02, real=0.58 secs]
2019-08-17T11:07:26.778+0800: 48777.269: [GC2019-08-17T11:07:26.779+0800: 48777.269: [ParNew: 5190464K->576704K(5190464K), 0.5531170 secs] 16126929K->11900488K(22491968K), 0.5542050 secs] [Times: user=3.6
4 sys=0.01, real=0.56 secs]
2019-08-17T11:07:29.753+0800: 48780.243: [GC2019-08-17T11:07:29.753+0800: 48780.243: [ParNew: 5190464K->484832K(5190464K), 0.4668920 secs] 16514248K->12147612K(22491968K), 0.4677250 secs] [Times: user=3.0
1 sys=0.00, real=0.46 secs]
2019-08-17T11:07:38.895+0800: 48789.385: [GC2019-08-17T11:07:38.895+0800: 48789.385: [ParNew: 5098592K->576704K(5190464K), 0.2818390 secs] 16761372K->12241212K(22491968K), 0.2826330 secs] [Times: user=2.2
4 sys=0.00, real=0.28 secs]
2019-08-17T11:07:46.308+0800: 48796.799: [GC2019-08-17T11:07:46.309+0800: 48796.799: [ParNew: 5190464K->576704K(5190464K), 0.4371320 secs] 16854972K->12504773K(22491968K), 0.4380620 secs] [Times: user=2.9
7 sys=0.01, real=0.44 secs]
2019-08-17T11:07:50.904+0800: 48801.394: [GC2019-08-17T11:07:50.904+0800: 48801.395: [ParNew: 5190089K->436781K(5190464K), 0.3699670 secs] 17118159K->12646070K(22491968K), 0.3708110 secs] [Times: user=2.4
5 sys=0.01, real=0.38 secs]
2019-08-17T11:07:51.277+0800: 48801.767: [GC [1 CMS-initial-mark: 12209288K(17301504K)] 12646582K(22491968K), 0.2209210 secs] [Times: user=0.22 sys=0.00, real=0.22 secs]
2019-08-17T11:07:51.498+0800: 48801.989: [CMS-concurrent-mark-start]
2019-08-17T11:07:55.619+0800: 48806.109: [CMS-concurrent-mark: 4.119/4.121 secs] [Times: user=14.62 sys=0.38, real=4.12 secs]
2019-08-17T11:07:55.619+0800: 48806.110: [CMS-concurrent-preclean-start]
2019-08-17T11:07:55.739+0800: 48806.229: [CMS-concurrent-preclean: 0.108/0.120 secs] [Times: user=0.34 sys=0.02, real=0.12 secs]
2019-08-17T11:07:55.739+0800: 48806.230: [CMS-concurrent-abortable-preclean-start]
2019-08-17T11:07:57.305+0800: 48807.795: [GC2019-08-17T11:07:57.305+0800: 48807.795: [ParNew: 5050541K->230975K(5190464K), 0.1115260 secs] 17259830K->12440264K(22491968K), 0.1124190 secs] [Times: user=0.8
4 sys=0.01, real=0.12 secs]
CMS: abort preclean due to time 2019-08-17T11:08:01.862+0800: 48812.353: [CMS-concurrent-abortable-preclean: 5.958/6.123 secs] [Times: user=14.89 sys=0.63, real=6.12 secs]
2019-08-17T11:08:01.864+0800: 48812.355: [GC[YG occupancy: 2379557 K (5190464 K)]2019-08-17T11:08:01.865+0800: 48812.355: [GC2019-08-17T11:08:01.865+0800: 48812.355: [ParNew: 2379557K->306806K(5190464K),
0.1407880 secs] 14588846K->12516095K(22491968K), 0.1417330 secs] [Times: user=1.03 sys=0.01, real=0.15 secs]
2019-08-17T11:08:02.007+0800: 48812.497: [Rescan (parallel) , 0.0565410 secs]2019-08-17T11:08:02.063+0800: 48812.554: [weak refs processing, 0.0011670 secs]2019-08-17T11:08:02.064+0800: 48812.555: [scrub
string table, 0.0062860 secs] [1 CMS-remark: 12209288K(17301504K)] 12516095K(22491968K), 0.2064110 secs] [Times: user=1.43 sys=0.01, real=0.21 secs]
2019-08-17T11:08:02.071+0800: 48812.562: [CMS-concurrent-sweep-start]
2019-08-17T11:08:07.810+0800: 48818.301: [GC2019-08-17T11:08:07.811+0800: 48818.301: [ParNew: 4920566K->429676K(5190464K), 0.1617750 secs] 13061507K->8570616K(22491968K), 0.1628060 secs] [Times: user=1.23
sys=0.01, real=0.16 secs]
2019-08-17T11:08:12.512+0800: 48823.002: [GC2019-08-17T11:08:12.513+0800: 48823.003: [ParNew: 5043436K->454510K(5190464K), 0.2457460 secs] 9206556K->4675336K(22491968K), 0.2466240 secs] [Times: user=1.75
sys=0.01, real=0.25 secs]
2019-08-17T11:08:13.447+0800: 48823.937: [CMS-concurrent-sweep: 10.944/11.376 secs] [Times: user=44.77 sys=1.97, real=11.37 secs]
2019-08-17T11:08:13.447+0800: 48823.938: [CMS-concurrent-reset-start]
2019-08-17T11:08:13.513+0800: 48824.003: [CMS-concurrent-reset: 0.065/0.065 secs] [Times: user=0.31 sys=0.07, real=0.07 secs]
2019-08-17T11:08:17.937+0800: 48828.427: [GC2019-08-17T11:08:17.937+0800: 48828.428: [ParNew: 5068270K->472234K(5190464K), 0.2318510 secs] 8860461K->4341829K(22491968K), 0.2327560 secs] [Times: user=1.63
sys=0.00, real=0.24 secs]
2019-08-17T11:08:25.183+0800: 48835.673: [GC2019-08-17T11:08:25.184+0800: 48835.674: [ParNew: 5085994K->561958K(5190464K), 0.1959590 secs] 8955589K->4485967K(22491968K), 0.1969300 secs] [Times: user=1.46
sys=0.00, real=0.20 secs]
2019-08-17T11:08:32.218+0800: 48842.708: [GC2019-08-17T11:08:32.218+0800: 48842.709: [ParNew: 5175718K->488604K(5190464K), 0.2410410 secs] 9099727K->4502228K(22491968K), 0.2419920 secs] [Times: user=1.71
sys=0.00, real=0.24 secs]
2019-08-17T11:08:39.239+0800: 48849.729: [GC2019-08-17T11:08:39.240+0800: 48849.730: [ParNew: 5102364K->477861K(5190464K), 0.2332310 secs] 9115988K->4571114K(22491968K), 0.2343110 secs] [Times: user=1.63
sys=0.00, real=0.24 secs]
2019-08-17T11:08:46.610+0800: 48857.100: [GC2019-08-17T11:08:46.610+0800: 48857.101: [ParNew: 5091621K->429776K(5190464K), 0.2107400 secs] 9184874K->4611486K(22491968K), 0.2116540 secs] [Times: user=1.51
sys=0.01, real=0.21 secs]
2019-08-17T11:08:54.909+0800: 48865.399: [GC2019-08-17T11:08:54.909+0800: 48865.400: [ParNew: 5043536K->549821K(5190464K), 0.1887970 secs] 9225246K->4731531K(22491968K), 0.1900180 secs] [Times: user=1.43
sys=0.00, real=0.19 secs]
2019-08-17T11:09:03.947+0800: 48874.437: [GC2019-08-17T11:09:03.947+0800: 48874.438: [ParNew: 5163581K->486679K(5190464K), 0.2226900 secs] 9345291K->4724142K(22491968K), 0.2236710 secs] [Times: user=1.61
sys=0.01, real=0.22 secs]
2019-08-17T11:09:12.631+0800: 48883.122: [GC2019-08-17T11:09:12.632+0800: 48883.122: [ParNew: 5100439K->390273K(5190464K), 0.1972480 secs] 9337902K->4690593K(22491968K), 0.1983700 secs] [Times: user=1.41
sys=0.00, real=0.20 secs]
2019-08-17T11:09:20.105+0800: 48890.595: [GC2019-08-17T11:09:20.105+0800: 48890.596: [ParNew: 5004004K->344026K(5190464K), 0.2136360 secs] 9304325K->4735593K(22491968K), 0.2147120 secs] [Times: user=1.47
sys=0.00, real=0.22 secs]
2
Answers
I finally figure out the root cause. The code makes a lot of objects (more than 10GB) when peak traffic arrives, which makes the young GC frequently happens and the CMS GC fallback to SerialOld GC, it causes a long time STW.
Looking at your GC log, it seems like CMS is not right choice for your app.
Consider trying other GC algorithm:
If you want to stick with CMS …
Your promotion rate is very high (100+ MiB/s)
CMS need much large old space size to handle such throughput and keep fragmentation low. Though, it seems like your server is already at memory limit.