skip to Main Content

The is a snipped from the source CSV file.

%status,date,job,project,start,end,description
%
//,18.03.2021,sib,sib-dede,07:00,15:00,dede-mongo
%
//,11.06.2021,sib,sib-dede,07:00,15:00,dede-mongo
%
//,24.06.2021,sib,sib-dede,07:00,15:00,dede-mongo
%
?,02.08.2021,sib,sib-accounting,14:35,16:35,business-plan
%
?,13.10.2021,sb,sb-accounting,11:30,12:00,e-mail-pump

I like to extract from the source CSV file the start time in column 5 and the end time in column 6.

Next, based on start and end time I like to process the duration in hours or minutes (don’t mind).

Eventually, I like to grab the source CSV file, insert a new column between the existing column 6 and 7 with the processed duration and save this addition in a result CSV file.

Done anyone has an idea how to face this challenge on a GNU Debian Linux command line?

I am aware that I can cut specific columns from a CSV file like this.

cut -d, -f5,6 < ./source.csv > ./result.csv

However, I am still missing the duration processing and the result CVS file creation.

PS: I have a preference for Bash.

PPS: There are related questions like this but I found none close to this question.

2

Answers


  1. I have a preference for Bash

    But you could do all of it in one go, e.g. using mktime from GNU Awk (assuming the % signs (or the subsequent newlines) in your sample are just a typo, and that the time in column 6 is higher than the one in column 5, otherwise the result is negative):

    awk -F , -v OFS=, '{
      $8=$7; $7=(
        mktime(sprintf("0 0 0 %d %d 0", substr($6,1,2), substr($6,4,2))) - 
        mktime(sprintf("0 0 0 %d %d 0", substr($5,1,2), substr($5,4,2)))
      ) / 60; print
    }' source.csv
    
    Login or Signup to reply.
  2. Using any awk and assuming the timestamps are all the same day since you don’t have any indication of date in your input and so no robust way to handle durations that could be longer than 24 hours:

    $ awk '
        BEGIN { FS=OFS="," }
        NF > 1 {
            if ( NR == 1 ) {
                diff = "diff"
            }
            else {
                split($5, b, ":")
                split($6, e, ":")
                beg = b[1]*60 + b[2]
                end = e[1]*60 + e[2]
                diff = end - beg
            }
            $6 = $6 OFS diff
            print
        }
    ' file
    %status,date,job,project,start,end,diff,description
    //,18.03.2021,sib,sib-dede,07:00,15:00,480,dede-mongo
    //,11.06.2021,sib,sib-dede,07:00,15:00,480,dede-mongo
    //,24.06.2021,sib,sib-dede,07:00,15:00,480,dede-mongo
    ?,02.08.2021,sib,sib-accounting,14:35,16:35,120,business-plan
    ?,13.10.2021,sb,sb-accounting,11:30,12:00,30,e-mail-pump
    

    No need to run grep first – piping grep to awk is an antipatterm, see https://porkmail.org/era/unix/award#grep.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search