skip to Main Content

Can someone point me in the right direction on this..

I have a string that contains sentences of words
e.g. ‘He was trying to learn a pythonic, or regex, way of solving his problem’

The string in question is quite large and I need to break it up into multiple lines, where each line can not exceed 64 characters.
BUT I cant just insert a line break every 64 characters. I need to ensure the break occurs at the closest character (from a set of characters) before the 64th character, to ensure the line does not exceed 64 characters.
e.g. I can only insert a line break after a space, comma or full stop

I also need the solution to be quite efficient as it is an action that will occur many, many times.

Using textwrap

I’m not sure textwrap is the way to go for my problem because I need to preserve the original line breaks in the input string.
Example:

long_str = """
123456789 123456789 123456789 123456789 123456789 123456789
Line 1: Artificial intelligence (AI), sometimes called machine intelligence, 
Line 2: is intelligence demonstrated by machines, 
Line 3: in contrast to the natural intelligence displayed by humans and  other animals. 
Line 4: In computer science AI research is defined as
"""
lines = textwrap.wrap(long_str, 60, break_long_words=False)
print('n'.join(lines))

What I want is this:

123456789 123456789 123456789 123456789 123456789 123456789
Line 1: Artificial intelligence (AI), sometimes called 
machine intelligence, 
Line 2: is intelligence demonstrated by machines, 
Line 3: in contrast to the natural intelligence displayed 
by humans and other animals. 
Line 4: In computer science AI research is defined as

But textwrap gives me this:

 123456789 123456789 123456789 123456789 123456789 123456789
Line 1: Artificial intelligence (AI), sometimes called
machine intelligence,  Line 2: is intelligence demonstrated
by machines,  Line 3: in contrast to the natural
intelligence displayed by humans and other animals.  Line 4:
In computer science AI research is defined as

I suspect that Regex is probably the answer but I’m out of my depth trying to solve this with regex.

3

Answers


  1. It might help us answer your question if you could provide any code you have already attempted. That being said, I believe the following example code will preserve existing line breaks, wrap lines exceeding 64 characters and preserve the format of the rest of the string.

    import textwrap
    
    long_str = "Lorem ipsum dolor sit amet, consectetur adipiscing elit, " 
           "sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. " 
           "Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris" 
           "nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in" 
           "reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. " 
           "Excepteur sint occaecat cupidatat non proident, sunt in culpa qui" 
           "officia deserunt mollit anim id est laborum."
    
    lines = textwrap.wrap(long_str, 64, break_long_words=False)
    
    print('n'.join(lines))
    

    The output from Python is:

    Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do
    eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut
    enim ad minim veniam, quis nostrud exercitation ullamco
    laborisnisi ut aliquip ex ea commodo consequat. Duis aute irure
    dolor inreprehenderit in voluptate velit esse cillum dolore eu
    fugiat nulla pariatur. Excepteur sint occaecat cupidatat non
    proident, sunt in culpa quiofficia deserunt mollit anim id est
    laborum.
    
    Login or Signup to reply.
  2. import textwrap
    
    def f1(foo): 
        return iter(foo.splitlines())
    
    long_str = """
    123456789 123456789 123456789 123456789 123456789 123456789
    Line 1: Artificial intelligence (AI), sometimes called machine intelligence, 
    Line 2: is intelligence demonstrated by machines, 
    Line 3: in contrast to the natural intelligence displayed by humans and  other animals. 
    Line 4: In computer science AI research is defined as
    """
    [print('n'.join(textwrap.wrap(l, 64, break_long_words=False))) for l in f1(long_str)]
    

    with the iterating over the lines of a string per this

    Login or Signup to reply.
  3. Split the long string into separate lines on the newline. Wrap each separate line “as usual”, then concatenate everything again to a single string.

    import textwrap
    
    long_str = """
    123456789 123456789 123456789 123456789 123456789 123456789
    Line 1: Artificial intelligence (AI), sometimes called machine intelligence, 
    Line 2: is intelligence demonstrated by machines, 
    Line 3: in contrast to the natural intelligence displayed by humans and  other animals. 
    Line 4: In computer science AI research is defined as
    """
    
    lines = []
    for line in long_str.split('n'):
        lines += textwrap.wrap(line, 60, break_long_words=False)
    print('n'.join(lines))
    

    As textwrap returns a list of strings, you don’t need to do anything else than keep on pasting them together, and join them at the end.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search