skip to Main Content

I have a source code which frequently includes a piece of code like

foo
(
    bar
    (
        foo0(<An arbitrary number of parenthesis may appear here>)
    ),
    foo1bar(<An arbitrary number of parenthesis may appear here>)
)

I want to capture this piece; the way that I am going for is

grep -A15 -E "foo[[:space:]]*$" <file_name>

to make sure that enough lines after foo are captured.

However, a more accurate way is looking for a pattern which counts opened/closed parenthesis after foo in order to stop searching right after the matching closed parenthesis of foo is found.

Is it possible to avoid scripting this algorithm by using grep options?

Example
My file is

...

foo
(
    bar
    (
        a(b)
    ),
    c(d)
)
...
dummy
(
    nextDummy()
)
...

where ... represents lines of code which does not contain any ( or ) character.The expected output of grep is

foo
(
    bar
    (
        a(b)
    ),
    c(d)
)
dummy
(
    nextDummy()
)

2

Answers


  1. Using any awk in any shell on every Unix box to print all the functions to stdout:

    $ awk '/^(/{$0=prev ORS $0; f=1} f; /^)/{f=0} {prev=$0}' file
    foo
    (
        bar
        (
            a(b)
        ),
        c(d)
    )
    dummy
    (
        nextDummy()
    )
    

    or to print every function to it’s own file:

    $ awk '/^(/{close(out); out=prev; $0=prev ORS $0; f=1} f{print > out} /^)/{f=0} {prev=$0}' file
    
    $ head -100 foo dummy
    ==> foo <==
    foo
    (
        bar
        (
            a(b)
        ),
        c(d)
    )
    
    ==> dummy <==
    dummy
    (
        nextDummy()
    )
    

    or if you have a specific function you want to print:

    $ awk -v tgt='foo' '/^(/ && (prev==tgt){$0=prev ORS $0; f=1} f; /^)/{f=0} {prev=$0}' file
    foo
    (
        bar
        (
            a(b)
        ),
        c(d)
    )
    

    $ awk -v tgt='dummy' '/^(/ && (prev==tgt){$0=prev ORS $0; f=1} f; /^)/{f=0} {prev=$0}' file
    dummy
    (
        nextDummy()
    )
    

    In the above we’re assuming that a function body starts with ( on a line of it’s own and ends with ) on a line of it’s own and the function name is the line immediately preceding the start of the body.

    Assuming whatever language your source code is written in supports strings and/or comments, it’s impossible to do what you want just by counting parentheses as those could appear inside strings and comments.

    You can’t do this job 100% robustly without writing a parser for whatever language your source code is written, the best we can do with pattern matching against your source code is help you write a script that’ll work with the subset of the language you provide as sample input/output.

    Login or Signup to reply.
  2. If your grep supports -P (PCRE) option, would you please try:

    grep -zoP "[A-Za-z_]w*s*(((?:[^()]+|(?1))*))" file
    

    Output with the provided file:

    foo
    (
        bar
        (
            a(b)
        ),
        c(d)
    )
    dummy
    (
        nextDummy()
    )
    
    • [A-Za-z_]w*s* matches the names such as foo or dummy followed
      by posible space characters.
    • (((?:[^()]+|(?1))*)) matches a substring enclosed by
      parantheses including the sequence of either of:

      • [^()]+: any characters other than parentheses
      • (?1): recursion of the pattern enclosed by the outermost parentheses
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search