I have a string of "28th and 8th. I will be working on this task". I want to super script the "th". I created a func, but this func only works fine if the last 3 letters of the first date and the second date are the different. In case they are the same, my func goes wrong
var exampletString = "28th and 8th. I will be working on this task"
var convertedAttrString = NSMutableAttributedString(string: exampletString)
func applySuperscriptAttributes(substring: String, originalString: String, convertedString: NSMutableAttributedString){
if let subRange = originalString.range(of: substring){
let convertedRange = NSRange(subRange, in: originalString)
convertedString.setAttributes([.font: UIFont.systemFont(ofSize: 10, weight: .regular)], range: NSRange(location: convertedRange.location + convertedRange.length - 2, length: 2))
}
}
applySuperscriptAttributes(substring: "28th", originalString: exampletString, convertedString: convertedAttrString)
applySuperscriptAttributes(substring: "8th", originalString: exampletString, convertedString: convertedAttrString)
// The result is "28th and 8th ..." with the "8th" not super script
2
Answers
You are using the method
String.range(of:)
. That method finds the FIRST occurrence of the substring in the target string.The string "8th" appears as part of the first string "28th", which has already been been converted to superscripted.
You will need to write code that intelligently parses your string by words, looking for the word "8th", not a string of characters "8th" that could be in the middle of another word.
Edit:
There are actually lots of gotchas, edge cases, and tricky situations to deal with.
The first, which hung you up, is that you want to make sure your search string doesn’t appear in the middle of a word.
@Bram’s answer using regular expressions is one way to solve this problem, or at least part of this problem.
The regular expression bit
d+
stands for "one or more digits." Then Bram inserted your suffix, "th", after that. So that regex will match a sequence of one or more digits followed by a suffix. Because thed+
part matches any number of digits, it matches 8, 28, or 20000008. It doesn’t get fooled by numbers who’s last digit is an 8, like your code does.However. Bram’s regular expression does not make sure that your number plus suffix is not in the middle of a larger word. If you had a sentence "Words words x8thz words words 8th" It would happily match the "8th" in the middle of the "word" "x8thz".
You could easily add to the regular expression so that it only detected
8th
and other "digits followed by a suffix" cases if they were enclosed with "white space", but then what if the very first or last part of your string is match? The string "5th of his name" has a5th
in it, and it has white space after it, but not before it.Even trickier is punctuation. Is punctuation part of a word? Is it white-space? The answer is "it’s complicated." Example: Is
'
a single quote, wrapping a word or phrase, or is it an apostrophe that is part of the word?Composing regular expression to handle all of those cases is a bit tricky.
Note: Bram provided a solution to this in his regular expression answer.
The Foundation framework has a (rather old) string parsing function,
enumerateSubstrings(in:options:using:)
that lets you step through substrings of a larger string. It has an options parameter that lets you enumerate your string various ways, including.byWords
. The.byWords
option is quite smart, and handles very complex case for figuring out what is part of a word and what is whitespace/punctuation.However, it is a function of the old Foundation class NSString, and is written in Objective-C. That makes it a bit of a pain to use. You have to cast your
String
to anNSString
to use it. Worse, it takes a closure that handles each substring that’s enumerated, and one of the parameters to that closure is an Objective-C boolean passed by reference. You have to set that parameter to true to get the enumeration to stop. Swift maps that parameter to an unsafe mutable pointer to a bool, orUnsafeMutablePointer<ObjCBool>
. Those are a pain to deal with.Here is some sample code that finds occurrences of "8th" in a sentence using
enumerateSubstrings(in:options:using:)
:The String extension
nsRangeOfWord(_:printWords:)
:Attempts to find the first occurrence of your word in the target String, as a separate word. if it finds your word, it returns the NSRange in the string where it is found.
It is not fooled by either "2008th" or "8thing".
It is also smart enough to treat punctuation as a word delimiter, and detect the difference between
'
when it’s used as quotes vs when it’s used as an apostrophe.In the fragment "The dog said ‘The…" the dog’s quote is enclosed in single-quotes. The bit
'The
is the word "the" with a preceding single quote. That'
is not part of the word. However, in the wordisn't
, the'
is an apostrophe that IS part of the word.However, unlike the regex function, this code doesn’t automatically find any string of digits followed by a suffix like "th". It would take more work to do that.
A more modern function built into iOS/MacOS is the Natural Language framework. It is much more powerful than either RegEx or the
NSString
functionenumerateSubstrings(in:options:using:)
, and includes the ability to break up natural language into words, or "tokenize" the text.Natural language code very similar to the
nsRangeOfWord()
function above looks like this:It yields the same results as the code based on
enumerateSubstrings(in:options:using:)
The Natural Language tokenizer returns ranges as
String
ranges (typeRange<String.Index>
.) That is the "Swifty" way to deal with ranges inString
objects, and for purely Swift code that usesString
s, it is preferred. However, your code to build and modifyNSMutableAttributedString
uses old Objective-C Foundation functions that wantNSRange
s. I therefore created an extension toString
that will convert aString
range to anNSRange
.Edit #2:
I’m stuck in Covid isolation and have time on my hands, so decided to write a Playground that pulls all this together (Starting from Bram’s RegEx code) and creates superscripts. Below is the code:
And the result looks like this:
(It also makes the superscripted text red to make it stand out, as in Bram’s code. You can remove that part if you want.)
You can use regex to solve this issue.
It also has support for additional 1st, 2nd, 3rd suffixes: