skip to Main Content

I’m creating a download link to an item that is stored on AWS / S3.

As I am building out this link, I’ve confirmed that the data is encoded in UTF-8, but when the user goes to download it, they are hit with this error anytime the link contains anything other than ASCII encoding.

<Error>
<Code>InvalidArgument</Code>
<Message>Header value cannot be represented using ISO-8859-1.</Message>
<ArgumentName>response-content-disposition</ArgumentName>
<ArgumentValue>attachment;filename="今日も僕は 用もないのに.mp3"</ArgumentValue>
<RequestId>12345</RequestId>
<HostId>12345</HostId>
</Error>
//first attempt - whatever encoding it IS, convert it to utf-8
$encoded_file = mb_convert_encoding($original_filename, "UTF-8", mb_detect_encoding($original_filename));

//second attempt - force filename to use html entities
$encoded_file = mb_convert_encoding($original_filename,'HTML-ENTITIES','UTF-8');

$obj_data['ResponseContentDisposition'] = 'attachment;filename="' . $encoded_file . '"';
$cmd = $s3->getCommand('GetObject', $obj_data);
$presign_url_request = $s3->createPresignedRequest($cmd, AWS_PRESIGNED_URL_EXPIRATION);

Forcing the attachment;filename to use htmlentities works – but it’s really ugly. If I am converting the filename into UTF-8, why am I getting this error from AWS that the header value cannot use ISO-8859-1?

2

Answers


  1. Unfortunately HTTP message Header doesn’t have the same restrictions as the HTTP message body.

    UTF-8 is supported in message body but not in the header (for historic and technical reasons). PHP urlencode function is worth a try for headers but not sure it will improve things.

    Allowed characters in HTTP header values
    https://stackoverflow.com/a/75998796/8199678

    Login or Signup to reply.
  2. HTTP headers are forbidden from containing anything other than ISO-8859-1, and strings of any other incompatible encoding must be encoded in conformance to established specs.

    In this case, it is RFC6266.

    function rfc6266_encode($string, $encoding) {
        $out = '';
        for( $i=0,$l=strlen($string); $i<$l; ++$i ) {
            $o = ord($string[$i]);
            if( $o >= 127 ) {
                $out .= sprintf('%%%02x', $o);
            } else {
                $out .= $string[$i];
            }
        }
        return sprintf('%s"%s"', $encoding, $out);
    }
    
    var_dump(rfc6266_encode('今日も僕は 用もないのに.mp3', 'utf-8'));
    

    Output:

    string(111) "utf-8"%e4%bb%8a%e6%97%a5%e3%82%82%e5%83%95%e3%81%af %e7%94%a8%e3%82%82%e3%81%aa%e3%81%84%e3%81%ae%e3%81%ab.mp3""
    

    And you would use it in your code like:

    $obj_data['ResponseContentDisposition'] = 'attachment;filename="' . rfc6266_encode($original_filename, $original_filename_encoding) . '"';
    

    That said, do not rely on mb_detect_encoding() as it make a guess as to what the encoding might be. String encoding is metadata that must be captured alongside the data itself and preserved.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search