skip to Main Content

I have data twitter in a CSV file (that I’m mining with a Python API). I get around 1000 lines of data. Now I want to shorten the tweet data using the specific Indonesian words “macet” or “kecelakaan” (in English “traffic” or “accident”) and put the matching rows into a new separate CSV file, just like in Excel using find all.

The sample data twitter is example1.csv and the new file which will be created after the search of the word “macet” or “kecelakaan” is example2.csv. But there is no result.

import re
import csv

with open('example1.csv', 'r') as csvFile:
    reader = csv.reader(csvFile)

if re.search(r'macet', reader):
    for row in reader:
        myData = list(row)
        print(row)

newFile = open('example2.csv', 'w')
with newFile:
    writer = csv.writer(newFile)
    writer.writerows(myData)

print("Writing complete")

I use spyder for environment Python 3.6.

The CSV file is already in the same folder with Spyder. Here is the screen capture image of my CSV twitter data

myCSVtwitterData

updated : Sample of csv file. OS using : Windows

3

Answers


  1. There are a couple of problems with your code.

    In your reading loop you are passing a csv.reader object to re.search, but it doesn’t know how to search that object. You need to pass it text or byte strings.

    The line

    myData = list(row)
    

    converts row into a new list and saves it to myData, but it’s already a list, so no conversion is necessary. And that line replaces the previous contents of myData, but you actually want to save all the matching rows. However, there’s no need to save the rows, you can just write them to the new file as you go.

    Anyway, here’s a repaired version of your code. From the screen shot it looks like you only want to search the text in column 2 of the input data (which corresponds to column C in your spreadsheet). I’ve created a regex that searches for the whole words “macet” and “kecelakaan”, the “b” matches at word boundaries so we don’t get a match if “macet” or “kecelakaan” is part of a larger word.

    import re
    import csv
    
    # Make a case-insensitive regex to match the words "macet" or "kecelakaan"
    pattern = re.compile(r'bmacetb|bkecelakaanb', re.I)
    
    with open('example1.csv', 'r', newline='') as csvFile, open('example2.csv', 'w', newline='') as newFile:
        reader = csv.reader(csvFile)
        writer = csv.writer(newFile)
    
        for row in reader:
            # Skip empty rows
            if not row:
                continue
            if pattern.search(row[2]):
                print(row)
                writer.writerow(row)
    
    print("Writing complete")
    

    I’ve just made a couple of improvements to that code. It now uses the newline='' arg to open the CSV files, and it skips any empty lines in the input CSV. And the regex now ignores the case when looking for matching words.

    Login or Signup to reply.
  2. Not answering about Python. But if you have a Linux OS, you can do it in one command line :

    grep -i "macet" exemple1.csv > exemple2.csv
    

    -i is for ignore case, so it will also match “Macet”

    Login or Signup to reply.
  3. how is it~?
    this code visit rows one by one
    and find cells that contain a word in word_list
    and write the value list on the row

    import re
    import csv
    
    word_list = ['macet', 'kecelakaan']
    
    with open('example1.csv', 'r') as csvFile, open('example2.csv', 'w') as newFile:
    
        reader = csv.reader(csvFile)
        writer = csv.writer(newFile, lineterminator='n')
    
        for row in reader:
            new_row = [content for content in row if any(map(lambda word: word in content, word_list))]
            if(new_row != []):
                print(new_row)
                writer.writerow(new_row)
    
    print("Writing complete")
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search