I am trying to redact something that log4j is overriding in general. To do this, I am trying to change a regex to ensure it captures what I need…an example…
"definition":{"schema":{"columns":[{"dataType":"INT","name":"column_a","description":"description1"},{"dataType":"INT","name":"column_b","description":"description2"}]}}}, "some other stuff": ["SOME_STUFF"], etc.
Hoping to capture just…
{"schema":{"columns":[{"dataType":"INT","name":"column_a","description":"*** REDACTED ***"},{"dataType":"INT","name":"column_b","description":"description"}]}}}
I have this…
(?<=("definition":{))(\.|[^\])*?(?=}})
Where if I keep adding a } at the end it will keep highlighting what I need. The problem is that there is no set number of nested elements in the list.
Is there anyway to adjust the above so I can capture everything within the outer brackets?
2
Answers
If you don’t have other brackets after the last one you’re trying to match, this regex should work for you:
The main difference is moving the brackets from the lookarounds to the matching part.
Check the demo here.
This regex should work for you if you cannot use a proper JSON parser:
RegEx Demo
Breakdown:
(?<="definition":)
: Lookbehind condition to make sure we have"definition":
before the current position.+?}
: Match 1+ of any characters ending with}
(?=,h*")
: Lookahead to assert that we have a comma then 0 or more spaces followed by a"
ahead of the current positionIn Java use this regex declaration: