Debian - How to extract columns from CSV file, process and create a CSV file based on the extract and processing result?

dancesWithCycles
September 3, 2024
183 views
0 votes
2 Answers

The is a snipped from the source CSV file.

%status,date,job,project,start,end,description
%
//,18.03.2021,sib,sib-dede,07:00,15:00,dede-mongo
%
//,11.06.2021,sib,sib-dede,07:00,15:00,dede-mongo
%
//,24.06.2021,sib,sib-dede,07:00,15:00,dede-mongo
%
?,02.08.2021,sib,sib-accounting,14:35,16:35,business-plan
%
?,13.10.2021,sb,sb-accounting,11:30,12:00,e-mail-pump

I like to extract from the source CSV file the start time in column 5 and the end time in column 6.

Next, based on start and end time I like to process the duration in hours or minutes (don’t mind).

Eventually, I like to grab the source CSV file, insert a new column between the existing column 6 and 7 with the processed duration and save this addition in a result CSV file.

Done anyone has an idea how to face this challenge on a GNU Debian Linux command line?

I am aware that I can cut specific columns from a CSV file like this.

cut -d, -f5,6 < ./source.csv > ./result.csv

However, I am still missing the duration processing and the result CVS file creation.

PS: I have a preference for Bash.

PPS: There are related questions like this but I found none close to this question.

Answers

- pmf
- September 3, 2024 at 11:18 am
- 0 votes
0
I have a preference for Bash

But you could do all of it in one go, e.g. using mktime from GNU Awk (assuming the % signs (or the subsequent newlines) in your sample are just a typo, and that the time in column 6 is higher than the one in column 5, otherwise the result is negative):
```
awk -F , -v OFS=, '{
  $8=$7; $7=(
    mktime(sprintf("0 0 0 %d %d 0", substr($6,1,2), substr($6,4,2))) - 
    mktime(sprintf("0 0 0 %d %d 0", substr($5,1,2), substr($5,4,2)))
  ) / 60; print
}' source.csv
```
Login or Signup to reply.

Using any awk and assuming the timestamps are all the same day since you don’t have any indication of date in your input and so no robust way to handle durations that could be longer than 24 hours:

$ awk '
    BEGIN { FS=OFS="," }
    NF > 1 {
        if ( NR == 1 ) {
            diff = "diff"
        }
        else {
            split($5, b, ":")
            split($6, e, ":")
            beg = b[1]*60 + b[2]
            end = e[1]*60 + e[2]
            diff = end - beg
        }
        $6 = $6 OFS diff
        print
    }
' file
%status,date,job,project,start,end,diff,description
//,18.03.2021,sib,sib-dede,07:00,15:00,480,dede-mongo
//,11.06.2021,sib,sib-dede,07:00,15:00,480,dede-mongo
//,24.06.2021,sib,sib-dede,07:00,15:00,480,dede-mongo
?,02.08.2021,sib,sib-accounting,14:35,16:35,120,business-plan
?,13.10.2021,sb,sb-accounting,11:30,12:00,30,e-mail-pump

No need to run grep first – piping grep to awk is an antipatterm, see https://porkmail.org/era/unix/award#grep.

Please signup or login to give your own answer.

Click here to cancel reply.

Debian – How to extract columns from CSV file, process and create a CSV file based on the extract and processing result?

Answers