I have a TSV file that contain multiple columns, but it doesn’t seem to be aligned properly. There are linebreaks between quotes in one column (the column "Examples" in the example). I want to put all the example sentences (within quotes) under the same column. How can I fix that using Python (and Javascript)?
example.tsv - actual output
ID Data Fruit Examples
1 August Apple "I have an apple. (Ana)
I give her an apple. (Tomas)
There are apples (Lisa)"
2 July Melon "I have a melon. (Ana)
I give him a melon. (Tomas)
There are melons (Lisa)"
3 May Lemon "I have a lemon. (Ana)
I give him a lemon. (Tomas)
There are lemons (Lisa)"
...
example.tsv - ideal output
ID Data Fruit Examples
1 August Apple "I have an apple. (Ana) I give her an apple. (Tomas) There are apples (Lisa)"
2 July Melon "I have a melon. (Ana) I give him a melon. (Tomas) There are melons (Lisa)"
3 May Lemon "I have a lemon. (Ana) I give him a lemon. (Tomas) There are lemons (Lisa)"
EDITED:
Thank you all for advice and sorry for the confusion. I indeed need Python codes at the moment, but I was going to do it with Javascript too. This is what I’ve gotten so far on Python, using regex, but this doesn’t merge the sentences together….
df = 'example.tsv'
import re
with open(df, 'r+', encoding='utf-8') as file:
content = file.read()
content_replaced = re.sub('[^Srn]*[nr]s*', " ", content)
print(content)
3
Answers
You could use a regular expression – replace all new lines after a quote and following non-quote characters and before non-quote characters and a quote:
This is difficult to answer, you need to include some code and tell us what you already tried.
That said, here’s an example of how you could do this with Javascript, assuming you already have that column’s text as a variable.
That’s already a valid TSV file – since the newlines are in quotes, the newline becomes part of the single cell, it doesn’t start a new row. Nothing says a computer format like TSV has to be pretty to look at. But if you did want to consolodate, and you were using python, you could read with a CSV parser, change the cell and write:
Note that you’ve lost information – the newlines were delimiting the sentences.