I tried various methods to open the file and pass it as a whole. But I am unable to do it. Either the output is zero or Empty set.
I have a log file containing data such as :
Time Log Nitrogen: 5/1/12: 3:39am - 4:43am data file study 3:57pm - 5:06pm bg ui, combo boxes 7:44pm - 8:50pm bg ui with scaler; slider 10:30pm - 12:48am state texts; slider 5/2/12: 10:00am - 12:00am discuss with Blanca about the data file 5/8/12: 11:00pm - 11:40pm mapMC, 5/9/12: 3:05pm - 3:42pm wholeMapMC, subMapMC, AS3 functions reading 10:35pm - 1:33am whole view data; scrollpane; 5/10/12: 6:10pm - 8:13pm blue slider 5/11/12: 8:45am - 12:10pm purple slider 1:30pm - 5:00pm Nitrate bar 11:18pm - 12:03am change NitrogenViewBase to static 5/12/12: 8:06am - 9:47am correct data and change NitrogenViewBase to static 5:45pm - 8:00pm costs bar, embed font 9:51pm - 12:31am costs bar 5/13/12: 7:45am - 8:45am read the Nitrogen Game doc 5/15/12: 2:07am - 5:09am corn 2:06pm - 5:11pm hypoxic zone 5/16/12: 2:53pm - 5:09pm data re-structure 7:00pm - 9:10pm sub sections watershed data 5/17/12: 12:30am - 2:32am sub sections sliders 10:30am - 11:45am meet with Dr. Lant and Blanca 3:09pm - 5:05pm crop yield and sub sections pink bar 7:00pm - 7:50pm sub sections nitrate to gulf bar 5/18/12: 3:15pm - 3:52pm sub sections slider legend 5/27/12: 5:46pm - 7:30pm feedback fixes 6/20/12: 2:57pm - 5:00pm Teachers' feedback fixes 7:30pm - 8:30pm 6/22/12: 3:40pm - 5:00pm 6/25/12: 3:24pm - 5:00pm 6/26/12: 11:24am - 12:35pm 7/4/12: 1:00pm - 10:00pm research on combobox with dropdown subitem - to no avail 7/5/12: 1:30am - 3:00am continue the research 9:31am - 12:45pm experiment on the combobox-subitem concept 3:45pm - 5:00pm 6:23pm - 8:14pm give up 8:18pm - 10:00pm zone change 11:07pm - 12:00am 7/10/12: 11:32am - 12:03pm added BASE_X and BASE_Y to the NitrogenSubView 4:15pm - 5:05pm fine-tune the whole view map 7:36pm - 8:46pm 7/11/12: 1:38am - 4:42am 7/31/12: 11:26am - 1:18pm study photoshop path shape 8/1/12: 2:00am - 3:41am collect the coordinates of wetland shapes 10:31am - 11:40am restorable wetlands implementation 4:00pm - 5:00pm 8/2/12: 12:20am - 4:42am 8/10/12: 2:30am - 4:55am sub watersheds color match; wetland color & size change 3/13/13: 6:06pm - 6:32pm Make the numbers in the triangle sliders bigger and bolder; Larger font on "Crop Yield Reduction"
How to calculate the total time spent by parsing the time log file? I am unable to parse the file as a whole.
I tried :
import re
import datetime
text="""5/1/12: 3:39am - 4:43am data file study
3:57pm - 5:06pm bg ui, combo boxes
7:44pm - 8:50pm bg ui with scaler; slider
10:30pm - 12:48am state texts; slider
5/2/12: 10:00am - 12:00am discuss with Blanca about the data file
5/8/12: 11:00pm - 11:40pm mapMC,"""
total=re.findall("(d{1,2}:d{1,2}[ap]m)s*-s*(d{1,2}:d{1,2}[ap]m)",text)
print(sum([datetime.datetime.strptime(t[1],"%I:%M%p")-datetime.datetime.strptime(t[0],"%I:%M%p") for t in total],datetime.timedelta()))
Executing this I get the time in negative format. How to work over it?
3
Answers
To account for time overlapping days, you have to calculate duration for both days separately and add it together.
Please refer below code
Output
You could parse your log file in a Panda dataframe and then easily make your calculations:
Output
You already have two interesting and working solutions from Liju and Sebastien D. Here I propose two new variants that, while similar, have important performance advantages.
The two current solutions approach the problem in this way:
Solution by
one_pass
, proposed by Liju: takes the regex matches and sums a list created by list comprehension. During that comprehension, it parses the same two strings to datetime three times (to evaluate>
, to outputif
, or to outputelse
).Solution by
dateparser
, proposed by Sebastien D: takes each line of text and tries to regex a date out of the line, then tries finds the start/end times from that same line (could be improved to a single regex, but the regex is not this solution’s bottleneck). It then usesdateparser
to combine date and time and also collect the text description. This would be more akin to a full fledged parser, but for the purposes of time tests I removed the description functionality.The two new solutions are similar:
Solution by
two_pass
: similar toone_pass
but in the first pass it just parses the strings to datetime and in the second pass it evaluatesstart > end
and sums the correct timedelta. The main advantage is that it only parses dates once, with the downside of having to iterate twice.Solution by
pure_pandas
: similar todateparser
, but only calls regex once and uses pandas’ built-into_datetime
for parsing.If we compare the performance of all these solutions with different text lengths, we can see that
w_dateparser
is by far the least performant solution.If we zoom in to compare the other three solutions, we see that
w_pure_pandas
is a little slower than the other solutions for smaller text lengths, but it excels in comparing longer entries by taking advantage of numpy C-implementations (as opposed to list comprehensions used by the other solutions). Secondly,two_pass
is generally faster thanone_pass
, and increasingly faster for longer texts.The code for
two_pass
andw_pure_pandas
:The full code for all solutions and time testing: