skip to Main Content

I am trying to redact something that log4j is overriding in general. To do this, I am trying to change a regex to ensure it captures what I need…an example…

"definition":{"schema":{"columns":[{"dataType":"INT","name":"column_a","description":"description1"},{"dataType":"INT","name":"column_b","description":"description2"}]}}}, "some other stuff": ["SOME_STUFF"], etc.

Hoping to capture just…

{"schema":{"columns":[{"dataType":"INT","name":"column_a","description":"*** REDACTED ***"},{"dataType":"INT","name":"column_b","description":"description"}]}}}

I have this…

(?<=("definition":{))(\.|[^\])*?(?=}})

Where if I keep adding a } at the end it will keep highlighting what I need. The problem is that there is no set number of nested elements in the list.

Is there anyway to adjust the above so I can capture everything within the outer brackets?

2

Answers


  1. If you don’t have other brackets after the last one you’re trying to match, this regex should work for you:

    (?<="definition":){.*}(?:})
    

    The main difference is moving the brackets from the lookarounds to the matching part.

    Check the demo here.

    Login or Signup to reply.
  2. This regex should work for you if you cannot use a proper JSON parser:

    (?<="definition":).+?}(?=,h*")
    

    RegEx Demo

    Breakdown:

    • (?<="definition":): Lookbehind condition to make sure we have "definition": before the current position
    • .+?}: Match 1+ of any characters ending with }
    • (?=,h*"): Lookahead to assert that we have a comma then 0 or more spaces followed by a " ahead of the current position

    In Java use this regex declaration:

    String regex = "(?<="definition":).+?\}(?=,\h*")";
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search