skip to Main Content

I am trying to write a RegEx to generate random strings of the type [:graph:] but exclude some specific characters contained in the string $blacklist (for example some symbols may be absent on specific language keyboards and not necessarily everyone knows the ASCII or Unicode codes to make them) but can possibly include others in the string $whitelist for example whitespace.

This code allows you to remove characters that do not belong to [:graph:] and works correctly in PHP:

$secure_password = substr(preg_replace('`[^[:graph:]]+`s', "", random_bytes($length * 10)), 0, $length);

Thanks to regex101.com I figured out how to extend the pattern to exclude any additional characters, namely something like this:

#(?:[^[:graph:]]+|[~`]+)#

In the above although the "~`" symbols are contained in the [:graph:] class they are among those that will be deleted. I understand that we will need to use the preg_quote function to do the escaping of the regex delimiters.

In this other code I tried to extend the set of characters to be removed:

$pattern = (empty($blacklist))
    ? "[^[:graph:]]+"
    : "(?:[^[:graph:]]+|[".preg_quote($blacklist, '`')."]+)";

$secure_password = substr(preg_replace($pattern, "", random_bytes($length * 10)), 0, $length);

But it doesn’t work fine.

Could someone please help me out? Thank you.

2

Answers


  1. Chosen as BEST ANSWER

    Regex writing is among the most important aspects of solving the problem by doing some thinking about filtering.

    Regex beginners might divide the problem into two parts: the first is to include the characters that are not present in [:graph:] and then remove the unwanted ones. A possible snippet might be the following script:

    <?php
    
    // Settings
    $blacklist = '~`';
    $whitelist = ' ';
    $delimiter = '`';
    $length = 50;
    
    // Constructs pattern to take all characters now blacklisted characters are also present
    $pattern = "[^";
    $pattern .= ($whitelist != "")
        ? preg_quote($whitelist, $delimiter)
        : "";
    $pattern .= "[:graph:]]+";
    
    // Generate a raw string without the UTF-8 characters
    $secure_password = preg_replace(
        "{$delimiter}{$pattern}{$delimiter}s",
        "",
        random_bytes($length * 10)
    );
    
    // If there are characters to remove it builds the specific pattern and then filters them out
    if ($blacklist != "") {
        $pattern = "[". preg_quote($blacklist, $delimiter) ."]+";
        $secure_password = preg_replace(
            "{$delimiter}{$pattern}{$delimiter}s",
            "",
            $secure_password
        );
    }
    
    $secure_password = substr($secure_password, 0, $length);
    

    Those with a little more knowledge about Regex might compact the code as follows:

    <?php
    
    // Settings
    $blacklist = '~`';
    $whitelist = ' ';
    $delimiter = '`';
    $length = 50;
    
    // Builds the optimized pattern and filters the string
    $pattern = "[^";
    $pattern .= ($whitelist != "")
        ? preg_quote($whitelist, $delimiter)
        : "";
    $pattern .= "[:graph:]]+";
    $pattern .= ($whitelist != "")
        ? "|[". preg_quote($whitelist, $delimiter) ."]+"
        : "";
    
    $secure_password = preg_replace(
        "{$delimiter}{$pattern}{$delimiter}s",
        "",
        random_bytes($length * 10)
    );
    
    $secure_password = substr($secure_password, 0, $length);
    

    Finally if we wanted to do a check on the length of the filtered string to prevent it from being shorter than desired and wanted to use a function the code would become:

    <?php
    
    /**
     * The function generates a random string in the [:graph] set to which it includes any characters that are not in the class, for example whitespace, but removes specific characters
     * 
     * @param int $length           The length of the string we want to receive.
     * @param string $blacklist     The unwanted characters in the [:graph] class.
     * @param string $whitelist     The characters to include since they are not present in [:graph]
     * @return string               The secure password
     */
    function graph_customized(
        int $length,
        string $blacklist = "",
        string $whitelist = ""
    ) : string {
        // Keep the random string
        $random_str = "";
        
        // How many times the desidered length
        $times = 10;
    
        // Generate at least one time the secure string
        do {
            // Generate a random string of bytes
            $random_bytes = random_bytes($times * $length);
            
            // Build the regex pattern
            $delimiter = "`"
            $whitelist = preg_quote($whitelist, $delimiter);
            $pattern = "[^{$whitelist}[:graph:]]+";
            if ($blacklist !"= "") {
                $pattern .= "|[". preg_quote($blacklist, $delimiter) ."]+";
            }
            
            // Filter eligible characters including whitelist characters but excluding blacklist characters
            $random_str .= preg_replace(
                "{$delimiter}(?:$pattern){$delimiter}s",
                "",
                $random_bytes
            );
            
            // If we do not want white characters at the beginning and end of the string uncomment the following statement. To filter only the left side use the ltrim function or rtrim for the right side
            //$random_str = trim($random_str);
            
        } while (strlen($random_str) < $length);
        
        return substr($random_str, 0, $length);
    }
    

    Possibly some thoughts could be given as to whether the preg_match_all function is better than preg_replace for this scenario, the pattern and more.

    I hope this contribution of mine will be useful to someone.


  2. Surely regex is not the best way to generate passwords, and the required length is also not guaranteed mathematically, but OP asked how to extend his script with whitelist. I would use negative look-ahead and delete chars one-by-one.

    $blacklist = '~`';
    $whitelist = ' ';
    $delimiter = '`';
    $length = 50;
    $pattern = empty($whitelist) ? '' :
        '(?![' . preg_quote($whitelist, $delimiter) . '])';
    $pattern .= empty($blacklist) ? '[^[:graph:]]'
        : '[[:^graph:]' . preg_quote($blacklist, $delimiter) . ']';
    
    $secure_password = substr(preg_replace($delimiter . $pattern . $delimiter . 's', '', random_bytes($length * 10)), 0, $length);
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search