skip to Main Content

I have an array (converted from a string) that contains words with non-standard letters (letters not used in English, like ć, ä, ü). I don’t want to replace those characters, I want to get rid of the whole words that have them.

from [Adam-Smith, Christine, Müller, Roger, Hauptstraße, X Æ A-12]
to   [Adam-Smith, Christine, Roger]

This is what I got so far:

<?php 
    $tags = "Adam-Smith, Christine, Müller, Roger, Hauptstraße, X Æ A-12";

    $tags_array = preg_split("/,/", $tags); 

    $tags_array = array_filter($tags_array, function($value){
       return strstr($value, "a") === false;
    });

    foreach($tags_array as $tag) {
        echo "<p>".$tag."</p>";
    }
?> 

I have no idea how to delete words that are not [a-z, A-Z, 0-9] and [(), "", -, +, &, %, @, #] characters. Right now the code deletes every word with an "a". What should I do to achieve this?

3

Answers


  1. This should do the work for you

    https://onlinephp.io/c/dd46c

    $tags = ['Adam-Smith', 'Christine', 'Müller', 'Roger', 'Hauptstraße', 'X Æ A-12'];
    $output = [];
                
    foreach($tags as $word){
        if (!preg_match('/[^A-Z-a-z!@#$%^&*()+-"]/', $word)) {
                        $output[] = $word;
        }
    }
                
    print_r($output);
    

    output

    Array(
    [0] => Adam-Smith
    [1] => Christine
    [2] => Roger
    )
    
    Login or Signup to reply.
  2. $raw = 'Adam-Smith, Christine, Müller, Roger, Hauptstraße, X Æ A-12, johnny@knoxville, some(person), thing+asdf, Jude "The Law" Law, discord#124123, 100% A real person, shouldntadd.com';
    
    $regex = '/[^A-Za-z0-9s-()"+&%@#]/';
    
    $tags = array_map('trim', explode(',', $raw));
    
    $tags = array_filter($tags, function ($tag) use ($regex) {
        return !preg_match($regex, $tag);
    });
    
    var_dump($tags);
    

    Yields:

    array(9) {
        [0]=>
        string(10) "Adam-Smith"
        [1]=>
        string(9) "Christine"
        [2]=>
        string(5) "Roger"
        [3]=>
        string(16) "johnny@knoxville"
        [4]=>
        string(12) "some(person)"
        [5]=>
        string(10) "thing+asdf"
        [6]=>
        string(18) "Jude "The Law" Law"
        [7]=>
        string(14) "discord#124123"
        [8]=>
        string(18) "100% A real person"
      }
    

    If you want to include a full stop as an allowable character (if you were checking for email addresses), you can add . to the end of the regex.

    Login or Signup to reply.
  3. This task can be completed more directly/efficiently than the earlier answers demonstrate. Just split on commas which may have leading or trailing spaces AND treat any names with non-whitelisted characters as delimiters too.

    The result array will only contain the qualifying names and they will be whitespace trimmed without making any extra calls.

    / *, *|[^,]*[^, a-zd()-+&%@#][^,]*/i
    #                                    ^- case-insensitive pattern
    #      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^--- match names containing at least one non-whitelisted character
    #     ^-------------------------------- OR
    #^^^^^--------------------------------- optional leading spaces or trailing spaces around a comma
    

    Code: (Demo)

    var_export(
        preg_split(
            '/ *, *|[^,]*[^, a-zd()-+&%@#][^,]*/i',
            $tags,
            0,
            PREG_SPLIT_NO_EMPTY
        )
    );
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search