I am trying to write a RegEx to generate random strings of the type [:graph:] but exclude some specific characters contained in the string $blacklist (for example some symbols may be absent on specific language keyboards and not necessarily everyone knows the ASCII or Unicode codes to make them) but can possibly include others in the string $whitelist for example whitespace.
This code allows you to remove characters that do not belong to [:graph:] and works correctly in PHP:
$secure_password = substr(preg_replace('`[^[:graph:]]+`s', "", random_bytes($length * 10)), 0, $length);
Thanks to regex101.com I figured out how to extend the pattern to exclude any additional characters, namely something like this:
#(?:[^[:graph:]]+|[~`]+)#
In the above although the "~`" symbols are contained in the [:graph:] class they are among those that will be deleted. I understand that we will need to use the preg_quote function to do the escaping of the regex delimiters.
In this other code I tried to extend the set of characters to be removed:
$pattern = (empty($blacklist))
? "[^[:graph:]]+"
: "(?:[^[:graph:]]+|[".preg_quote($blacklist, '`')."]+)";
$secure_password = substr(preg_replace($pattern, "", random_bytes($length * 10)), 0, $length);
But it doesn’t work fine.
Could someone please help me out? Thank you.
2
Answers
Regex writing is among the most important aspects of solving the problem by doing some thinking about filtering.
Regex beginners might divide the problem into two parts: the first is to include the characters that are not present in [:graph:] and then remove the unwanted ones. A possible snippet might be the following script:
Those with a little more knowledge about Regex might compact the code as follows:
Finally if we wanted to do a check on the length of the filtered string to prevent it from being shorter than desired and wanted to use a function the code would become:
Possibly some thoughts could be given as to whether the
preg_match_all
function is better thanpreg_replace
for this scenario, the pattern and more.I hope this contribution of mine will be useful to someone.
Surely regex is not the best way to generate passwords, and the required length is also not guaranteed mathematically, but OP asked how to extend his script with whitelist. I would use negative look-ahead and delete chars one-by-one.