skip to Main Content

I have a file that has ~12300000 rows of type <timestamp, reading>

1674587549.228 29214
1674587549.226 29384
1674587549.226 27813
1674587549.226 28403
1674587549.228 28445
...
1674587948.998 121
1674587948.998 126
1674587948.999 119
1674587949.000 126
1674587948.996 156
1674587948.997 152
1674587948.998 156
1674588149.225 316
1674588149.226 310
1674588149.223 150
1674588149.224 152
1674588149.225 150
1674588149.225 144
...
1674588149.225 227
1674588149.226 233
1674588149.226 275

The last - first timestamp equals 600. I want to create a new file that starts with row last - nth timestamp till the end.

For example, if n=200, the new file should start with 1674588149.226-200 i.e. from 1674587949.000 126 to 1674588149.226 275.

Can this be done using a linux command / shell script? If so how it can be done? Thanks.

2

Answers


  1. If I understood correctly, you are trying to create files which have a constant and equal number of lines in each, starting from the last one.

    If so, this script will perform the task.

    If you only want one file, then you can remove the logic associated with the looping and index value iterations.

    Note: The name of each file corresponds to the first part of the last line in each file (i.e. last entry of record).

    This example does splitting for groupings of 5 lines. You can replace the 5 by 100 or 200, as you see fit.

    #!/bin/bash
    
    input="testdata.txt"
    cat >"${input}" <<"EnDoFiNpUt"
    1674587948.998 121
    1674587948.998 126
    1674587948.999 119
    1674587948.996 156
    1674587948.997 152
    1674587948.998 156
    1674587949.000 126
    1674588149.225 316
    1674588149.226 310
    1674588149.223 150
    1674588149.224 152
    1674588149.225 150
    1674588149.225 144
    1674588149.225 227
    1674588149.226 233
    1674588149.226 275
    EnDoFiNpUt
    
    awk -v slice="5" 'BEGIN{
        split("", data) ;
        dataIDX=0 ;
    }
    {
        dataIDX++ ;
        data[dataIDX]=$0 ;
    }
    END{
        #print dataIDX ;
    
        slLAST=dataIDX ;
        #print slLAST ;
    
        slFIRST=slLAST-slice+1 ;
        if( slFIRST <= 0 ){
            slFIRST=1 ;
        } ;
        #print slFIRST ;
    
        k=0 ;
        while( slLAST > 0 ){
            k++;
            split(data[slLAST], datline, " " ) ;
            fname=sprintf("%s__%03d.txt", datline[1], k ) ;
            printf("t New file: %sn", fname ) | "cat >&2" ; 
    
            for( i=slFIRST ; i<=slLAST ; i++){
                print data[i] >fname ;
            } ;
    
            if( slFIRST == 1 ){
                exit ;
            } ;
    
            slLAST=slFIRST-1 ;
            slFIRST=slLAST-slice+1 ;
            if( slFIRST <= 0 ){
                slFIRST=1 ;
            } ;
        } ;
    }' "${input}"
        
    
    Login or Signup to reply.
  2. I you only want the last 200 line entries of a log, then the absolute simplest is by using tail. Namely

    tail -200 log.txt >${newLogName}
    

    If you want to create multiple files of 200 lines each, you could use the sequence

    tac log.txt | tail -n +201 | tac >log.remain
    mv log.remain log.txt
    

    in a loop that include assigning a unique name for each slice ${newLogName} slice.

    OR, you could create a reverse log at the outset, and create the sublists working down the reverse list, but remembering to reverse each individual shortlist before saving those in their final form.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search