skip to Main Content

I have a Python script written by my friend for text substitution which works in his system Ubuntu Focal.

The following is the script:

#!/usr/bin/env python3
"""
Script Name: replace_text.py
Purpose: This Python script performs text substitution in files within a given directory.
It replaces specific characters as per predefined substitutions, providing a convenient way to modify text files.

Usage:
python replace_text.py /path/to/your/directory

Note:
- Ensure you have Python installed on your system.
- The script processes all files within the specified directory and its subdirectories.
- Files are modified in-place, so have a backup if needed.
"""

import os
import sys

def replace_text_in_files(directory):
    # Character substitutions
    substitutions = {
        '': 'fi',
        '': 'fl',
        'ä': 'ā',
        'é': 'ī',
        'ü': 'ū',
        'å': 'ṛ',
        'è': 'ṝ',
        'ì': 'ṅ',
        'ñ': 'ṣ',
        'ï': 'ñ',
        'ö': 'ṭ',
        'ò': 'ḍ',
        'ë': 'ṇ',
        'ç': 'ś',
        'à': 'ṁ',
        'ù': 'ḥ',
        'ÿ': 'ḷ',
        'û': 'ḹ',
        'Ä': 'Ā',
        'É': 'Ī',
        'Ü': 'Ū',
        'Å': 'Ṛ',
        'È': 'Ṝ',
        'Ì': 'Ṅ',
        'Ñ': 'Ṣ',
        'Ï': 'Ñ',
        'Ö': 'Ṭ',
        'Ò': 'Ḍ',
        'Ë': 'Ṇ',
        'Ç': 'Ś',
        'À': 'Ṁ',
        'Ù': 'Ḥ',
        'ß': 'Ḷ',
        '“': '“',
        '”': '”',
        ' ': ' ',
        '‘': '‘',
        '–': '-',
        '’': '’',
        '—': '—',
        '•': '»',
        '…': '...',
    }

    # Walk through the directory and its subdirectories
    for root, dirs, files in os.walk(directory):
        for file_name in files:
            file_path = os.path.join(root, file_name)
            with open(file_path, 'r', encoding='utf-8') as file:
                file_content = file.read()
            
            # Perform substitutions
            for original, replacement in substitutions.items():
                file_content = file_content.replace(original, replacement)

            # Write the modified content back to the file
            with open(file_path, 'w', encoding='utf-8') as file:
                file.write(file_content)

if __name__ == "__main__":
    if len(sys.argv) != 2:
        print("Usage: python replace_text.py /path/to/your/directory")
        sys.exit(1)

    directory_path = sys.argv[1]
    replace_text_in_files(directory_path)
    print("Text substitution completed successfully.")

I Devuan Daedalus which is based on Debian 12 but without systemd.
Upon running this script on my machine, I get the following error:

~/Documents/software-related/software-files$ python3 replace_text.py ~/Desktop/test-dir/
Traceback (most recent call last):
  File "/home/vrgovinda/Documents/software-related/software-files/replace_text.py", line 89, in <module>
    replace_text_in_files(directory_path)
  File "/home/vrgovinda/Documents/software-related/software-files/replace_text.py", line 73, in replace_text_in_files
    file_content = file.read()
                   ^^^^^^^^^^^
  File "<frozen codecs>", line 322, in decode
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xad in position 41: invalid start byte

He doesn’t have any clue about this. And I know nothing about python. Hence I seek help of those who are knowledgeable in this forum.

I took the suggestion by Ofer Sadan to open the file as a bytes file. But that gives me another error:

 binary mode doesn't take an encoding argument

Please ask for additional information if:

  1. This question seems too vague/open-ended/generic.
  2. I haven’t provided sufficient information.

Thanks,

2

Answers


  1. Chosen as BEST ANSWER

    I had run the script on .doc and .docx files which aren't UTF-8 encoded I guess. When I did try the script on plain text files, the script works flawlessly.

    Sorry if I have wasted your time.

    Thanks for your contributions @frederic-laurencin and @elbashmubarmeg


  2. Python complains because while reading one of the files as bytes converted to utf-8 characters. It comes to a point where a byte is not a valid utf-8 character. Are you sure this file is actually a utf-8 encoded file ?
    https://www.charset.org/utf-8

    Trying to read the file as binary will give you the actual bytes, but you want to substitute characters. then you would have to convert the bytes to a string with utf-8 codec I guess, and you would end up with the same error.

    I would be extra careful in your case (backup) you are maybe trying to temper an actual binary file. are you sure the file you are touching is meant to be modified ?

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search