i want to replace ‘ and ’ with ' in html using python. i tried multiple ways but failed

VinaySavla
July 13, 2023
182 views
1 vote
2 Answers

I wrote a code that converts a Word document to HTML using pypandoc because I even want images in that. The problem is my docx file contains characters ‘ and ’ which turn into something different in HTML when sent as a mail body. I want ‘ and ’ to be replaced with ', a normal apostrophe.

Check the attached images so that the difference is clear enough.

source

expected result

I tried a few ways as shown in the code below. I commented out ways which I tried but failed.

# Read the HTML file
with open(html_file, 'r') as file:
    html_data = file.read()
            
    # Replace all occurrences of ',' with '
    # print("called")
    html_data = re.sub("‘", "'", html_data)
    html_data = re.sub("’", "'", html_data)
    # html_data = re.sub(r'’', "'", html_data)
    # html_data =  re.sub(r'‘', "'", html_data)
    # html_data = re.sub(r'“', '"', html_data)
    # html_data = re.sub(r'”', '"', html_data)
    # html_data = html_data.replace("‘", "'")
    # html_data = html_data.replace("’", "'")
    # html_data = html_data.replace('“', "'")
    # html_data = html_data.replace("”", "'")

For example, my Word document contains a phrase i’d like to that should get converted to i'd like to.

Answers

- GaloTorresSevilla
- July 12, 2023 at 8:43 pm
- 0 votes
0
I think you need to escape the character so it does not conflict with string declaration:
```
s = 'i’d like to'
m = s.replace('’', ''')
print(m)
```
Output:
```
"i'd like to"
```
Login or Signup to reply.

- SVSavla
- July 13, 2023 at 4:01 pm
- 0 votes
0
```
        # Read the HTML file
    with open(html_file, 'r') as file:
        html_data = file.read()
        
    # Replace all occurrences of ',' with '
    html_data = re.sub("‘", "'",html_data)
    html_data = re.sub("’", "'",html_data)
    html_data = re.sub("â€˜", "'",html_data)
    html_data = re.sub("â€™", "'",html_data)
```
Try this it works, in html ‘ is sometimes considered as â€˜ and ’ is considered as â€™ so it does not replaces using your code.
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.