I am trying to download files with the chrome.downloads.download(...)
. The filename is given externally, so I don’t know the characters inside. If it contains invalid characters, the function download
will throw an error Error: Invalid filename
.
-
Is there a regex in JavaScript that replaces all and only the invalid starting/middle/ending Unicode characters with
_
in the filename? -
Or is there a documentation listing the rules for a filename in Chrome?
-
Is there a way to make Chrome replace invalid characters in my filename, instead of throwing an error?
Chrome disallows more characters than common filesystems (e.g. NTFS), and I am not sure the exact definition of "invalid character" by Chrome. My current regex attempt is
var regex = /^.|.$|[x00-x1f\/:*?"<>|rnu200D]/g;
filename.replaceAll(regex, '_');
But it only covers a few of the invalid Unicode characters.
I avoid using the <a>
method to download (i.e. create <a>
with href
and download
attributes, then click on it), because I would like to create subdirectries in the downloads folder.
2
Answers
Regex matching all and only invalid Unicode characters / filenames
If the filename contains some invalid Unicode characters, or matches a reserved keyword in NTFS, then
chrome.downloads.download
will throwError: Invalid filename
.At any position (start/middle/end), the following Unicode characters are invalid:
p{Cc}
(u{0}
is allowed at the middle):
?
"
*
<
>
|
~
(NTFS reserved characters)and
/
(NTFS & Chrome treat them as path separators instead of a character in filename, so Invalid filename error will NOT occur)p{Cf}
p{Cn}
Zero-width joiner
u{200D}
is commonly used to composite emojis and form a new emoji. However, this character is invalid as well, as its category is "Format characters".The regex is
/[:?"*<>|~/\u{1}-u{1f}u{7f}u{80}-u{9f}p{Cf}p{Cn}]/gu
At the start/end of filename, the following Unicode characters are invalid:
u{0}
p{Zl}
p{Zp}
p{Zs}
.
(rule by NTFS)The regex is
/^[.u{0}p{Zl}p{Zp}p{Zs}]|[.u{0}p{Zl}p{Zp}p{Zs}]$/gu
Reserved keywords in NTFS: Filenames of
CON
,PRN
,AUX
,NUL
,COM1
,COM2
,COM3
,COM4
,COM5
,COM6
,COM7
,COM8
,COM9
,LPT1
,LPT2
,LPT3
,LPT4
,LPT5
,LPT6
,LPT7
,LPT8
,LPT9
(case-insensitive), with or without file extension, are all invalid.The regex is
/^(CON|PRN|AUX|NUL|COM[1-9]|LPT[1-9])(?=.|$)/gui
Other categories of Unicode characters, including private-use
p{Co}
and surrogate pairsp{Cs}
, are allowed at any position.Final regex (for JavaScript):
Note: These characters/filenames are also invalid in NTFS, but sometimes NTFS just deletes the character instead of displaying an error.
How to Determine the invalid Unicode characters
I wrote a script to try each Unicode character, and ran it on Google Chrome 126.0.6478.116 on Windows 11 23H2 22631.3737.
What the script does:
serviceWorker.postMessage('')
in SW console to start/stop the for loopchrome.downloads.download
, pass the Unicode character as filename and the URL createdError: Invalid filename
. The character is invalid if error caught, otherwise valid.chrome.storage.local
The invalid Unicode characters at the start/end are found. Within the set, some characters are invalid at the middle of a filename as well. So we run the script again, replacing
filename: char
withbackground.js:
manifest.json:
I’m not sure about the rules for a filename in Chrome, I found this developer documentation style guide, but a simple solution would be to only accept alphanumeric characters, the minus and the period, with the regex:
If there are many invalid characters, perhaps it is more convenient to delete them rather than replace them with an underscore character:
If you want to keep accented characters in filenames (although this is not recommended for filenames), you can exclude them too by adding them to the regex inside the square brackets, looking at this answer.