We are seeing this intermittent issue in production. The CPU gets pegged at 50% (2 core CPU) randomly and it never comes back. Only option is to restart the server.
This is how CPU appears from Dynatrace
This is how the thread dump looks when we analyzed through dynatrace.
Through my research, it appears there was a jdk defect
Calling 'java.util.zip.Deflater.finish()' prematurely hangs the application.
The application is spinning consuming one cpu
https://bugs.openjdk.java.net/browse/JDK-8060193
Only happens randomly when for some multiple filters are involved.
I was able to reproduce this using test class in above jira on CentOs vm which has JDK “1.8.0_201”
That was surprising because as per the docs and ticket, this has been fixed.
On further research, find similar defect opened again in jdk.
https://bugs.openjdk.java.net/browse/JDK-8193682
Now the team is not willing to work on it unless someone could reproduce it.
Since it is happening randomly in production, I am not sure how to reproduce it. The test class from https://bugs.openjdk.java.net/browse/JDK-8060193 still has issues. IS this even a valid test case?
If this is valid then there will be problems every time we send compressed data.
- Our run time JRE is Jdk 1.8
- Compression is at tomcat, not at load balancer.
Any pointers as to why is this happening and how we can solve this?
Update:
In one of the libraries we are using, it was throwing an exception
Malformed UTF-8 character (unexpected non-continuation byte 0x00, immediately after start byte 0xfd)
LastName, First’Name
As we can see, this is not a regular apostrophe.We can have this by copy pasting from word which auto corrects a regular apostrophe to this funky character.
Our reproducer did threw an error but CPU was not getting stuck. I think it happens under high volume and traffic.
2
Answers
I want to post an update to this problem that has bugged us for years. We had an inititiave to migrate static content to CDN underway. After CDN was implemented and all static resources was served from a different server, the ZipStream problem was resolved. Although the research showed that the problem was more for dynamic content and not static, I am not sure how the problem got solved. Maybe someone who is reading this answer can explain me how this has got fixed.
EDIT 4 Oct 2022
It seems that the problem has been fixed and applied to OpenJDK 11 and 17: https://bugs.openjdk.org/browse/JDK-8193682
Original answer
As I said in a comment before, we are facing this problem when we try to generate Zip files which are being written in the
OutputStream
of theHttpServletResponse
through aZipOutputStream
.The reason for the cores running at 100% is because of three (under certain conditions)infinite loops in ZipOutputStream(
closeEntry()
) and DeflaterOutputStream(write()
andfinish()
).These infinite loops look like this:
Where
def
is ajava.util.zip.Deflater
.If I understand right, this is the problem in JDK-8193682. There is a workaround class there which overwrites the
deflate
method ofZipOutputStream
.I am going to try to use a class based on that workaround, which accepts a timeout to be checked in the
deflate
method. I hope not to produce resource leaks with this approach.Related question: Thread locking when flushing jsp file