I’d like to remove all single line comments in some PHP code like this one using Visual Basic :
<?php
some code
// "(//[^;)]*)(?=?>)" <-- NOT REMOVED
/* a first comment */
/* a second comment with some // inside
// a single line ; comment inside <-- NOT REMOVED
// a comment with a ; website http://www.google.com <-- NOT REMOVED
a third comment
*/
/* a fourth comment with some // inside */
/* a fifth comment with some ftp://ftp.google;com//onefolder inside */
some code
if { // a comment <-- NOT REMOVED
doit();
} /* another comment */
some more code // a comment <-- NOT REMOVED
?>
<?php // a comment with a website http://www.google.com ?>
<?php and some more code // and a comment ?>
<?php var = 'http://www.google;com'; // and a comment ?>
<?php var = 'https://www.google;com'; // and a comment ?>
<?php var = 'ftp://www.google;com'; // and a comment ?>
<?php var = 'ftp://www.google;com'; // a comment with a website http://www.google.com ?>
<?php var = 'ftp://www.google;com'; // a comment ? <-- NOT REMOVED
?>
I’m using for the moment the "(//[^;)]*)(?=?>)"
pattern, that works for part, but I still have remaining lines. I don’t succeed in telling the regex to delete until it finds a ?>
sequence, if it is there before the end of line…
Can you help me to improve this regex ?
So it gives this (or something near according to the remaining n and multiples spaces left after the comments cleaning, that I can remove in a next clean) :
<?php
some code
/* a first comment */
/* a second comment with some // inside <-- END CAN BE REMOVED, NO PROBLEM
a third comment
*/
/* a fourth comment with some // inside */ <-- CAN BE REMOVED IF THE ENDING */ IS KEPT
/* a fifth comment with some ftp://ftp.google;com//onefolder inside */
some code
if {
doit();
} /* another comment */
some more code
?>
<?php ?>
<?php and some more code ?>
<?php var = 'http://www.google;com'; ?>
<?php var = 'https://www.google;com'; ?>
<?php var = 'ftp://www.google;com'; ?>
<?php var = 'ftp://www.google;com'; ?>
<?php var = 'ftp://www.google;com';
?>
NOTE : I have no problem doing this in multiple passes if this can lead to an easier-to-read code.
2
Answers
How about:
https://regex101.com/r/Pua5qG/latest
and replace with
${end}
in consequence only throwing away comments(^| +)//(?:.*?)(?=*/| ?>|$)
Match the start of the line or one or more space, then
//
, then lazily match characters until:*/
,?>
, or the end of the line.