In .NET we have the ability to compare strings using an ordinal comparison while ignoring the case. This is a best practice for string comparison especially when multiple cultures could be involved.
I’m looking for the exact equivalent in JavaScript. There are a ton of answers about JS string comparison but I couldn’t find much about ordinal comparison, never mind doing that AND ignoring case. I did find this question about comparing strings in an ordinal manner without ignoring case, but it’s not clear how I would do so while ignoring the case, so I think this is a fundamentally different question (hence my new question here).
How do you compare strings in JavaScript in an ordinal manner while ignoring case, just like in .NET?
2
Answers
There are two ways this may be achieved in javascript:
<
and>
operators (they perform ordinal comparision) and before that use.toLowerCase
or.toUpperCase
for case-insensitivity.string.charCodeAt(index)
method. This returns the index from UTF-16 charset, but the first 128 code points are shared between UTF-16 and ASCII, this shouldn’t be a problem, unless you need Extended ASCII…This may be implemented like this:
It’s hard to prove a negative (one can always overlook something), but as far as I can tell there’s no direct equivalent operation built into the standard runtime. Straight-up ordinal comparison is done by
<
and>
, as you note in the question. Other comparisons are done vialocaleCompare
, but while it has options to ignore case, it works with locale-specific rules, which you’ve said you don’t want. (I did wonder if there’s a locale specifier for "ordinal," but if there is, I haven’t found it.)Given that, the closest we can get is to convert both sides to the same capitalization and compare the result with
<
and>
. ThetoLowerCase
/toUpperCase
operations use locale-insensitive case mappings:So something along these lines:
While not falling prey to premature optimization, I’ll note that in the rare case where the performance is a particular concern, the above is slightly biased toward assuming the strings match (since then it just does
===
but not<
as well). If you had an edge case where the performance were critical, you might choose the order of comparisons to suit the data you’re comparing.Regardless of the order of comparisons, the above does (of course) have to completely convert both strings before it starts. Unless the strings being compared are extraordinarily long, I wouldn’t expect you to get any benefit out of avoiding that (in favor of doing the loop yourself, converting each character as you go, and short-circuiting as soon as you found the answer). But that would be an option if you found a use case for it. You’d have to decide whether to compare code units or to do the slightly-more-complicated thing of comparing code points. I’d probably go with code points as they’re more meaningful, but it depends on your use case. The
<
and>
operators work at the code unit level, but they don’t have to worry about correct case mappings.Just for what it’s worth, there are at least two ways to do it by code points: Using iterators, and using
codePointAt
. Here’s an example of doing it with iterators (TypeScript, but with the type annotations commented out):Or with
codePointAt
(note that the index you pass is the index in code units, which is why the code moves the indexes on by the length of the character that was found [which may be multiple code units]):These are somewhat off-the-cuff and you’d want to audit them before using them, though I’ve tested them with some basic inputs:
I’ve used
toLowerCase
in the above (as opposed totoUpperCase
) because I’ve found characters that, after being converted from lower case to upper case and then back again, are not the same as they started out; but I haven’t found the opposite to be true (converting from upper case to lower case and back again). It’s an edge case, but you have to pick one or the other, so I picked the one that seemed safer. Here’s how I checked:That only checks each code point in isolation, though. In some languages, combinations of code points can be meaningful, but I haven’t tried to allow for that in the simple test above. (But I did have it work at the code point level rather than just the code unit level.)