Have a string containing this
ud83dude80
ud83cudfb0
ud83dudd25
like sub-strings all of them start from
ud83
(telegram emoji) and have different 7 characters after
3
so trying to remove them with
text = re.sub(r'\ud83w{7}', '', text, flags=re.MULTILINE)
with no success what i do wrong? Thanks!
2
Answers
I think that, if you’re trying to remove everything after your telegram emoji code,
w
won’t catch thecharacter.
Try
which is telling the regex to look for 7 characters which could either be alpharnumeric or the
.
You are not dealing with 12 characters here. These seem to be only 2 unicode characters, which are not printable by python and therefore displayed in their escaped form.
You could create the character class
[ud83dud83c]
manually (adding every allowed starting character) or you find a way to do this programmatically.