skip to Main Content

the log file is

Oct 01 [time] a
Oct 02 [time] b
Oct 03 [time] c
.
.
.
Oct 04 [time] d
Oct 05 [time] e
Oct 06 [time] f
.
.
.
Oct 28 [time] g
Oct 29 [time] h
Oct 30 [time] i

and it is really big ( millions of lines )

I wanna to get logs between Oct 01 and Oct 30

I can do it with gawk

gawk 'some conditions' filter.log

and it works correctly.

and it return millions of log lines that is not good

because I wanna to get it part by part

some thing like this

gawk 'some conditions' -limit 100 -offset 200 filter.log

and every time when I change limit and offset

I can get another part of that.

How can I do that ?

2

Answers


  1. awk solution
    I would harness GNU AWK for this task following way, let file.txt content be

    1
    2
    3
    4
    5
    6
    7
    8
    9
    

    and say I want to print such lines that 1st field is odd in part starting at 3th line and ending at 7th line (inclusive), then I can use GNU AWK following way

    awk 'NR<3{next}$1%2{print}NR>=7{exit}' file.txt
    

    which will give

    3
    5
    7
    

    Explanation: NR is built-in variable, which hold number of row, when processing lines before 3 just go to next row without doing anything, when remainder from division by 2 is non-zero do print line, when processing 7th or further row just exit. Using exit might give notice boost in performance if you are processing relatively small part of file. Observe order of 3 pattern-action pairs in code above: next is first, then whatever you do want do, exit is last. If you want to know more about NR read 8 Powerful Awk Built-in Variables – FS, OFS, RS, ORS, NR, NF, FILENAME, FNR

    (tested in GNU Awk 5.0.1)

    linux solution
    If you prefer working with offset limit, then you might exploit tail-head combination e.g. for above file.txt

    tail -n +5 file.txt | head -3
    

    gives output

    5
    6
    7
    

    observe that offset goest first and with + before value then limit with - before value.

    Login or Signup to reply.
  2. Using OP’s pseudo code mixed with some actual awk code:

    gawk -v limit=100 -v offset=200 '
    some conditions { matches++                                # track number of matches
                      if (matches >= offset and limit > 0) {
                         print                                 # print current line
                         limit--                               # decrement limit
                      }
                      if (limit == 0) exit                     # optional: abort processing if we found "limit" number of matches
                    }
    ' filter.log
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search