skip to Main Content

Have a string containing this

ud83dude80
ud83cudfb0
ud83dudd25

like sub-strings all of them start from

ud83

(telegram emoji) and have different 7 characters after

3

so trying to remove them with

text = re.sub(r'\ud83w{7}', '', text, flags=re.MULTILINE)

with no success what i do wrong? Thanks!

2

Answers


  1. I think that, if you’re trying to remove everything after your telegram emoji code, w won’t catch the character.

    Try

    text = re.sub(r'\ud83[w\]{7}', '', text, flags=re.MULTILINE)
    

    which is telling the regex to look for 7 characters which could either be alpharnumeric or the .

    Login or Signup to reply.
  2. You are not dealing with 12 characters here. These seem to be only 2 unicode characters, which are not printable by python and therefore displayed in their escaped form.

    re.sub(r"[ud83dud83c]S", "", text)
    

    You could create the character class [ud83dud83c] manually (adding every allowed starting character) or you find a way to do this programmatically.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search