skip to Main Content

I’d like to remove all single line comments in some PHP code like this one using Visual Basic :

<?php
some code
// "(//[^;)]*)(?=?>)"         <-- NOT REMOVED
/* a first comment */
/* a second comment with some // inside
// a single line ; comment inside         <-- NOT REMOVED
// a comment with a ; website http://www.google.com         <-- NOT REMOVED
a third comment
*/
/* a fourth comment with some // inside */
/* a fifth comment with some ftp://ftp.google;com//onefolder inside */
some code
if { // a comment         <-- NOT REMOVED
    doit();
} /* another comment */
some more code // a comment         <-- NOT REMOVED
?>
<?php  // a comment with a website http://www.google.com ?>
<?php and some more code // and a comment ?>
<?php var = 'http://www.google;com'; // and a comment ?>
<?php var = 'https://www.google;com'; // and a comment ?>
<?php var = 'ftp://www.google;com'; // and a comment ?>
<?php var = 'ftp://www.google;com'; // a comment with a website http://www.google.com ?>
<?php var = 'ftp://www.google;com'; // a comment ?         <-- NOT REMOVED
?>

I’m using for the moment the "(//[^;)]*)(?=?>)" pattern, that works for part, but I still have remaining lines. I don’t succeed in telling the regex to delete until it finds a ?> sequence, if it is there before the end of line…

Can you help me to improve this regex ?

So it gives this (or something near according to the remaining n and multiples spaces left after the comments cleaning, that I can remove in a next clean) :

<?php
some code  

/* a first comment */
/* a second comment with some // inside <-- END CAN BE REMOVED, NO PROBLEM
  

a third comment
*/
/* a fourth comment with some // inside */ <-- CAN BE REMOVED IF THE ENDING */ IS KEPT
/* a fifth comment with some ftp://ftp.google;com//onefolder inside */
some code
if {
    doit();
} /* another comment */
some more code
?>
<?php ?>
<?php and some more code ?>
<?php var = 'http://www.google;com'; ?>
<?php var = 'https://www.google;com'; ?>
<?php var = 'ftp://www.google;com'; ?>
<?php var = 'ftp://www.google;com'; ?>
<?php var = 'ftp://www.google;com'; 
?>

NOTE : I have no problem doing this in multiple passes if this can lead to an easier-to-read code.

2

Answers


  1. How about:
    https://regex101.com/r/Pua5qG/latest

    (?x)                        # freespacing
    (                           # never match // in..
    ['"].*//.*['"]            # ... a strings
    |
    /*.*?*/                 # .. a comment
    )
    (*SKIP)(*FAIL)              # if you do-skip that match
    |
    (?P<commented_code>//.+?) # capture comments
    (?P<end>?>$)?              # omit ?> at end of line
    $
    

    and replace with ${end} in consequence only throwing away comments

    Login or Signup to reply.
  2. (^| +)//(?:.*?)(?=*/| ?>|$)

    Match the start of the line or one or more space, then //, then lazily match characters until: */, ?>, or the end of the line.

    enter image description here

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search