skip to Main Content

I have a TSV file that contain multiple columns, but it doesn’t seem to be aligned properly. There are linebreaks between quotes in one column (the column "Examples" in the example). I want to put all the example sentences (within quotes) under the same column. How can I fix that using Python (and Javascript)?

example.tsv - actual output

ID   Data   Fruit   Examples
1   August   Apple   "I have an apple. (Ana) 
I give her an apple. (Tomas) 
There are apples (Lisa)"
2   July   Melon   "I have a melon. (Ana) 
I give him a melon. (Tomas) 
There are melons (Lisa)"
3   May   Lemon   "I have a lemon. (Ana) 
I give him a lemon. (Tomas) 
There are lemons (Lisa)"
...   
example.tsv - ideal output

ID   Data   Fruit   Examples
1   August   Apple   "I have an apple. (Ana) I give her an apple. (Tomas) There are apples (Lisa)"
2   July   Melon   "I have a melon. (Ana) I give him a melon. (Tomas) There are melons (Lisa)"
3   May   Lemon   "I have a lemon. (Ana) I give him a lemon. (Tomas) There are lemons (Lisa)"

EDITED:
Thank you all for advice and sorry for the confusion. I indeed need Python codes at the moment, but I was going to do it with Javascript too. This is what I’ve gotten so far on Python, using regex, but this doesn’t merge the sentences together….

df = 'example.tsv'
import re
with open(df, 'r+', encoding='utf-8') as file:
    content = file.read()
    content_replaced = re.sub('[^Srn]*[nr]s*', " ", content)
    print(content)

3

Answers


  1. You could use a regular expression – replace all new lines after a quote and following non-quote characters and before non-quote characters and a quote:

    const src = `ID   Data   Fruit   Examples
    1   August   Apple   "I have an apple. (Ana) 
    I give her an apple. (Tomas) 
    There are apples (Lisa)"
    2   July   Melon   "I have a melon. (Ana) 
    I give him a melon. (Tomas) 
    There are melons (Lisa)"
    3   May   Lemon   "I have a lemon. (Ana) 
    I give him a lemon. (Tomas) 
    There are lemons (Lisa)"`
    
    $pre.textContent = src.replace(/(?<="[^"]+)n(?=[^"]+")/g, '');
    <pre id="$pre"></pre>
    Login or Signup to reply.
  2. This is difficult to answer, you need to include some code and tell us what you already tried.

    That said, here’s an example of how you could do this with Javascript, assuming you already have that column’s text as a variable.

    // 'text' defined like in your example above. 
    // With quote marks at the start and end and with line breaks as n
    const text = '"I have an apple. (Ana)nI give her an apple. (Tomas)nThere are apples(Lisa)"';
    
    // To replace the linebreaks with a space character using regular expression:
    const regexText = text.replace( /n/g, " ");
    
    // To replace the linebreaks with a space character using split/join
    const splitJoinText = text.split("n").join(" ");
    
    Login or Signup to reply.
  3. That’s already a valid TSV file – since the newlines are in quotes, the newline becomes part of the single cell, it doesn’t start a new row. Nothing says a computer format like TSV has to be pretty to look at. But if you did want to consolodate, and you were using python, you could read with a CSV parser, change the cell and write:

    import csv
    
    with open("test.tsv", newline="") as infile, open("testout.tsv", "w", newline="") as outfile:
        reader = csv.reader(infile, delimiter="t")
        writer = csv.writer(outfile, delimiter="t")
        for row in reader:
            row[3] = " ".join(row[3].split())
            writer.writerow(row)
    

    Note that you’ve lost information – the newlines were delimiting the sentences.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search