I have a report that is generated at the beginning of each month, in .csv format. Currently, the report contains a series of columns with assorted data; one of the columns is an ‘add_date’ field containing data in "YYYY-mm-dd HH:MM:SS" format.
My end goal is to parse this source CSV so that only rows containing ‘add_date’ cells with dates from the previous month remain. So for example, if the script were run on February 1st 2021, only the rows containing dates from January 2021 would remain in the output CSV file.
This is an example of the source CSV contents:
Name,Data1,add_date
jasmine,stuff ,2021-01-26 17:29:46
ariel,things,2021-01-26 17:48:04
ursula,foo,2016-11-02 19:32:09
belle,bar,2016-01-21 18:47:33
and this is the python script I have so far:
#!/usr/bin/env python3
import csv
filtered_rows = []
with open('test123.csv', newline='') as csvfile:
rowreader = csv.reader(csvfile, delimiter=',')
for row in rowreader:
if row["2021-01"] in csvfile.add_date:
filtered_rows.append(row)
print(filtered_rows)
which I call with the following command:
./testscript.py > testfile.csv
Currently, when I run the above command I am greeted with the following error message:
Traceback (most recent call last):
File "./testscript.py", line 9, in <module>
if row["2021-01"] in csvfile.add_date:
TypeError: list indices must be integers or slices, not str
My current Python version is Python 3.6.4, running in CentOS Linux release 7.6.1810 (Core).
2
Answers
If I undestood well, you can do something like this:
To do this properly you need to determine the previous month and year, then compare that to
add_date
field of each row. The year is important to handle December →January (as well as the possibility of multi-year) transitions.
Here’s what I mean.
I got the basic idea of how to determine the date of the previous month from @bgporter’s answer to the question How to determine date of the previous month?.