Given:
byteString
is
-----------------------------149742642616556
Content-Disposition: form-data; name="file"; filename="test.txt"
Content-Type: text/plain
test
-----------------------------149742642616556--
Then this code (not optimized):
Pattern pattern = Pattern.compile(BOUNDARY_PATTERN); // "(?m)\A-+\d+$"
Matcher matcher = pattern.matcher(byteString);
String boundary = null;
while (matcher.find()) {
boundary = matcher.group();
contentType = "multipart/form-data; boundary=" + boundary;
}
LOG.info("Content Type = " + contentType);
@SuppressWarnings("deprecation")
org.apache.commons.fileupload.MultipartStream multipartStream =
new org.apache.commons.fileupload.MultipartStream(new ByteArrayInputStream(byteString.getBytes()), boundary.getBytes());
ByteArrayOutputStream bos = new ByteArrayOutputStream();
multipartStream.readBodyData(bos); // throw error
byte[] byteBody = bos.toByteArray();
Throws this error:
org.apache.commons.fileupload.MultipartStream$MalformedStreamException: Stream ended unexpectedly
at org.apache.commons.fileupload.MultipartStream$ItemInputStream.makeAvailable(MultipartStream.java:1005)
at org.apache.commons.fileupload.MultipartStream$ItemInputStream.read(MultipartStream.java:903)
at java.io.InputStream.read(InputStream.java:101)
at org.apache.commons.fileupload.util.Streams.copy(Streams.java:100)
at org.apache.commons.fileupload.util.Streams.copy(Streams.java:70)
at org.apache.commons.fileupload.MultipartStream.readBodyData(MultipartStream.java:593)
What could be possibly wrong here? I would appreciate a help here.
2
Answers
The issue seems to be due to a bad end of line and the way the boundary is retrieved. According to a RFC2046 quote taken from a SO answer:
The problem lies precisely on two points: the end of line type and the two hyphens preceding the boundary parameter value.
End of lines
Since your code doesn’t show accurately the value of byteString, I tried both LF (
n
) and CRLF (rn
) end of lines to see what will happen.It appears the issue is reproduced when a bad end of line – i.e. not CRLF – is right before the last boundary, as shown below:
It sounds like the MultipartStream fails to parse the begin of the boundary, since it doesn’t catch a right end of line (CRLF) on the previous line. So, I you used LF terminators, you should replace them by CRLF ones.
Boundary format
The RFC tells that a boundary delimiter is two hyphens + boundary parameter + CRLF. Your regexp doesn’t catch only the boundary parameter value, it also includes the two hyphens. So I replaced this part:
Working code
Runnable as a MCVE
The code you’ll find below can be run in a console without Tomcat. Only commons-fileupload-1.3.3-bin.tar.gz and commons-io-2.6-bin.tar.gz are needed.
To view what’s parsed by the
MultipartStream
, I temporarily replacedbos
bySystem.out
in thereadBodyData()
call (as told in the comments).To compile:
To run:
The code itself
Output:
After some debugging, I found that
MultipartStream
is addingrn--
as a prefix to the boundary, because I didn’t have a newline at the beginning of the content I got theMultipartStream.MalformedStreamException("Stream ended unexpectedly")
exception because the boundary couldn’t be found.Maybe it’s because of an older
commons-fileupload
version or because I was reading the multipart content from an HTTP PUT request sent bycurl
tl;dr
add a newline at the beginning of your content if nothing else helped so far.