I have a Python script written by my friend for text substitution which works in his system Ubuntu Focal.
The following is the script:
#!/usr/bin/env python3
"""
Script Name: replace_text.py
Purpose: This Python script performs text substitution in files within a given directory.
It replaces specific characters as per predefined substitutions, providing a convenient way to modify text files.
Usage:
python replace_text.py /path/to/your/directory
Note:
- Ensure you have Python installed on your system.
- The script processes all files within the specified directory and its subdirectories.
- Files are modified in-place, so have a backup if needed.
"""
import os
import sys
def replace_text_in_files(directory):
# Character substitutions
substitutions = {
'': 'fi',
'': 'fl',
'ä': 'ā',
'é': 'ī',
'ü': 'ū',
'å': 'ṛ',
'è': 'ṝ',
'ì': 'ṅ',
'ñ': 'ṣ',
'ï': 'ñ',
'ö': 'ṭ',
'ò': 'ḍ',
'ë': 'ṇ',
'ç': 'ś',
'à': 'ṁ',
'ù': 'ḥ',
'ÿ': 'ḷ',
'û': 'ḹ',
'Ä': 'Ā',
'É': 'Ī',
'Ü': 'Ū',
'Å': 'Ṛ',
'È': 'Ṝ',
'Ì': 'Ṅ',
'Ñ': 'Ṣ',
'Ï': 'Ñ',
'Ö': 'Ṭ',
'Ò': 'Ḍ',
'Ë': 'Ṇ',
'Ç': 'Ś',
'À': 'Ṁ',
'Ù': 'Ḥ',
'ß': 'Ḷ',
'“': '“',
'”': '”',
' ': ' ',
'‘': '‘',
'–': '-',
'’': '’',
'—': '—',
'•': '»',
'…': '...',
}
# Walk through the directory and its subdirectories
for root, dirs, files in os.walk(directory):
for file_name in files:
file_path = os.path.join(root, file_name)
with open(file_path, 'r', encoding='utf-8') as file:
file_content = file.read()
# Perform substitutions
for original, replacement in substitutions.items():
file_content = file_content.replace(original, replacement)
# Write the modified content back to the file
with open(file_path, 'w', encoding='utf-8') as file:
file.write(file_content)
if __name__ == "__main__":
if len(sys.argv) != 2:
print("Usage: python replace_text.py /path/to/your/directory")
sys.exit(1)
directory_path = sys.argv[1]
replace_text_in_files(directory_path)
print("Text substitution completed successfully.")
I Devuan Daedalus which is based on Debian 12 but without systemd.
Upon running this script on my machine, I get the following error:
~/Documents/software-related/software-files$ python3 replace_text.py ~/Desktop/test-dir/
Traceback (most recent call last):
File "/home/vrgovinda/Documents/software-related/software-files/replace_text.py", line 89, in <module>
replace_text_in_files(directory_path)
File "/home/vrgovinda/Documents/software-related/software-files/replace_text.py", line 73, in replace_text_in_files
file_content = file.read()
^^^^^^^^^^^
File "<frozen codecs>", line 322, in decode
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xad in position 41: invalid start byte
He doesn’t have any clue about this. And I know nothing about python. Hence I seek help of those who are knowledgeable in this forum.
I took the suggestion by Ofer Sadan to open the file as a bytes file. But that gives me another error:
binary mode doesn't take an encoding argument
Please ask for additional information if:
- This question seems too vague/open-ended/generic.
- I haven’t provided sufficient information.
Thanks,
2
Answers
I had run the script on
.doc
and.docx
files which aren't UTF-8 encoded I guess. When I did try the script on plain text files, the script works flawlessly.Sorry if I have wasted your time.
Thanks for your contributions @frederic-laurencin and @elbashmubarmeg
Python complains because while reading one of the files as bytes converted to utf-8 characters. It comes to a point where a byte is not a valid utf-8 character. Are you sure this file is actually a utf-8 encoded file ?
https://www.charset.org/utf-8
Trying to read the file as binary will give you the actual bytes, but you want to substitute characters. then you would have to convert the bytes to a string with utf-8 codec I guess, and you would end up with the same error.
I would be extra careful in your case (backup) you are maybe trying to temper an actual binary file. are you sure the file you are touching is meant to be modified ?