Given a string such as:
"The user foo_bar has a Twitter account: https://twitter.com/foo_bar"
In order to be sent in markdown mode by the Telegram bots API, it should be formatted as:
"The user foo_bar has a Twitter account: [https://twitter.com/foo_bar]"
(Adding []
to url could be done using regex).
Is it possible to write a function in Python that can escape certain characters such as _
or *
in a text, but only when these characters are not contained within a URL?
Here is an example without checking character location:
original_text = 'The user foo_bar has a Twitter account: https://twitter.com/foo_bar'
formatting_url = re.sub(
'http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*(),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+', r'[g<0>]', original_text)
escaping_char = formatting_url.replace('*', '*').replace('_', '_')
print(escaping_char)
Output:
The user foo_bar has a Twitter account: [https://twitter.com/foo_bar]
Where the _
in url is also be replaced.
2
Answers
First add brackets to the url using regex. Then you can iterate over each letter of the string, add an escape character whenever you see one when outside of a url. You can raise a flag whenever you see character [ or ] to know if you are inside a url:
If you are using python-telegram-bot – then there is a method for escaping text (for both Markdown versions 1 and 2). If you need to add link there – propose to format it separately and concatenate these strings.