I have data twitter in a CSV file (that I’m mining with a Python API). I get around 1000 lines of data. Now I want to shorten the tweet data using the specific Indonesian words “macet” or “kecelakaan” (in English “traffic” or “accident”) and put the matching rows into a new separate CSV file, just like in Excel using find all
.
The sample data twitter is example1.csv and the new file which will be created after the search of the word “macet” or “kecelakaan” is example2.csv. But there is no result.
import re
import csv
with open('example1.csv', 'r') as csvFile:
reader = csv.reader(csvFile)
if re.search(r'macet', reader):
for row in reader:
myData = list(row)
print(row)
newFile = open('example2.csv', 'w')
with newFile:
writer = csv.writer(newFile)
writer.writerows(myData)
print("Writing complete")
I use spyder for environment Python 3.6.
The CSV file is already in the same folder with Spyder. Here is the screen capture image of my CSV twitter data
updated : Sample of csv file. OS using : Windows
3
Answers
There are a couple of problems with your code.
In your reading loop you are passing a
csv.reader
object tore.search
, but it doesn’t know how to search that object. You need to pass it text or byte strings.The line
converts
row
into a new list and saves it tomyData
, but it’s already a list, so no conversion is necessary. And that line replaces the previous contents ofmyData
, but you actually want to save all the matching rows. However, there’s no need to save the rows, you can just write them to the new file as you go.Anyway, here’s a repaired version of your code. From the screen shot it looks like you only want to search the text in column 2 of the input data (which corresponds to column C in your spreadsheet). I’ve created a regex that searches for the whole words “macet” and “kecelakaan”, the “b” matches at word boundaries so we don’t get a match if “macet” or “kecelakaan” is part of a larger word.
I’ve just made a couple of improvements to that code. It now uses the
newline=''
arg to open the CSV files, and it skips any empty lines in the input CSV. And the regex now ignores the case when looking for matching words.Not answering about Python. But if you have a Linux OS, you can do it in one command line :
-i is for ignore case, so it will also match “Macet”
how is it~?
this code visit rows one by one
and find cells that contain a word in word_list
and write the value list on the row