I have a source code which frequently includes a piece of code like
foo
(
bar
(
foo0(<An arbitrary number of parenthesis may appear here>)
),
foo1bar(<An arbitrary number of parenthesis may appear here>)
)
I want to capture this piece; the way that I am going for is
grep -A15 -E "foo[[:space:]]*$" <file_name>
to make sure that enough lines after foo
are captured.
However, a more accurate way is looking for a pattern which counts opened/closed parenthesis after foo
in order to stop searching right after the matching closed parenthesis of foo
is found.
Is it possible to avoid scripting this algorithm by using grep
options?
Example
My file
is
...
foo
(
bar
(
a(b)
),
c(d)
)
...
dummy
(
nextDummy()
)
...
where ...
represents lines of code which does not contain any (
or )
character.The expected output of grep
is
foo
(
bar
(
a(b)
),
c(d)
)
dummy
(
nextDummy()
)
2
Answers
Using any awk in any shell on every Unix box to print all the functions to stdout:
or to print every function to it’s own file:
or if you have a specific function you want to print:
In the above we’re assuming that a function body starts with
(
on a line of it’s own and ends with)
on a line of it’s own and the function name is the line immediately preceding the start of the body.Assuming whatever language your source code is written in supports strings and/or comments, it’s impossible to do what you want just by counting parentheses as those could appear inside strings and comments.
You can’t do this job 100% robustly without writing a parser for whatever language your source code is written, the best we can do with pattern matching against your source code is help you write a script that’ll work with the subset of the language you provide as sample input/output.
If your
grep
supports-P
(PCRE) option, would you please try:Output with the provided file:
[A-Za-z_]w*s*
matches the names such asfoo
ordummy
followedby posible space characters.
(((?:[^()]+|(?1))*))
matches a substring enclosed byparantheses including the sequence of either of:
[^()]+
: any characters other than parentheses(?1)
: recursion of the pattern enclosed by the outermost parentheses