skip to Main Content

I am trying to achieve this and I don’t believe it can’t be done with regex, albeit I would have probably done it with scripting already …

I have a bunch of text files that have multi-line text that fits in between lines that start with a dash (as the first non-blank character on that line). So each such multi-line text might have, on the last line before the dash-starting line below, a line that starts with ">" and contains, somewhere on that line, a hashtag.

Here is one such example:

    - ffff
      - aaa
          * bbb
          * ccc

        - tttt
          * aaa 
          * bbb
          > #tag
    - tttt 
      * aaaa 
      * bbbb
      > #log
-

I have managed to get the text in between dashes but I can’t seem to be able, for the life of me, to speciufy another rule, that in between those dashes there should be a line starting with ">" and containing the tag "#log" -for example:

(^s*)(-[sSn]+?(?=^s*-))

Is there a way, with REGEX, to match each multiline text in between dash-starting lines that contains, on it’s last line, ">.#log.$"?

I tried many many variations like tis one:

(^s*)(-[sSn]+?(?=>.*#log.*$ns*-))

This basically gets everything (greedy?), from the first dash in the file up to the line starting eith ">" that is followed by a newline starting with dash. So it does what it says it dows I just don’t know how to tell it that I don’t want any lines starting with dash in the match.

I somehow want to tell it to only consider the last dash-starting line (non-greedy?)…

Thank you for saving my sanity 🙂

Edit: tried to add this in a comment but it is very quirky:

expected output is:

- whatever text on n
  any number of lines which might have "-" dashes in there, n
  * lists n
      * sublist n
      etc n
  also some #tags and the whole "block" ends with the following line n
  > #tag1 #tag2 .... #log ... [maybe some links at the end, etc] n

So basically everything between a dash and until the line that starts with > with no other intermediary lines that start with a dash (a dash might be there inside the line though)

I've put "n" at the end of the lines because code block is not possible  

3

Answers


  1. I think you want to filter out this area of your input text:
    enter image description here

    So, for this regex should be:

    (-)[swn*>]*(#log)
    

    https://regex101.com/r/WSVznR/2
    enter image description here

    If you want to filter out data within hyphen(-) and #log, then your regex should be:

    (?<![^-])[swn*>]*(?=#log)
    

    https://regex101.com/r/SJuUK2/2
    If it’s contains some other special characters then add them accordingly.

    Here, I’m used Regex Lookahead and Lookbehind concept
    https://medium.com/@artbindu/puzzling-with-regular-expression-d2f6cc1d1976

    Thank you @RicardoSouza for your comment, which is really a valid scenario, which I completely escape before.
    So, I update my changes according to that.

    Login or Signup to reply.
  2. If I understood your problem correctly, you want to select the text from lines starting with a dash - to the closest line that starts with a greater-than sign > and contains the hashtag #log anywhere in that line.

    If that’s the case, a possible solution is the following

    ^s*?(-s*?.+?n(?:^s*?[^-> ].+?n)*?s*?>.*?#log.*?$)
    

    This matches a line starting with a -, then matches any following lines starting with anything that is not - or >, until if finds a line starting with > that contains #log in it.

    https://regex101.com/r/e95h9x/1

    Login or Signup to reply.
  3. This should take care of it, I tested in VSCode against your sample input and got two matches – the first chunk from * aaa to > #tag and then the second chunk from * aaaa to > # log

    (?<=-.+n)(?:[^-#]|n)+#.+

    So this will use a lookbehind to find a line starting with a hyphen but not include it, followed by a series of any character that’s not a # or -, including newlines, up until it gets to a #, and then anything else out to the end of that line

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search