skip to Main Content

I have a function that gets all the hashtag words in a post and outputs the words separated by a comma (because a post can have many hashtags) to be stored in a database column.

function getHashtags ($text) {
    // explode on spaces
    $text = explode(" ", $text);
    $hashtag = "";
    $hashReg = "/^[a-zA-Z0-9]+$/";
    // for every word in post
    foreach ($text as $word) {
        // 1st character #
        $char = substr($word, 0, 1);
        // word after character #
        $ref = substr($word, 1);
        // if 1st character in word is #
        if ($char == "#") {
            // check if only letters & numbers
            if (preg_match ($hashReg, $ref)) {
                // check hashtag length
                if (strlen($ref) <= 11) {
                    // set hashtag
                    $hashtag .= substr($word, 1).",";
                }
            }
        }
    }
    return $hashtag;
}

The function works well, e.g

$post = "#rock #music is good";
echo getHashtags($post);

// output: rock,music,

However if the example was $post="#rock, #music, is good" the comma after #rock and #music will make the function not work, this will also happen with any other characters like fullstops, question marks etc. I have tried adding a preg_replace('/[^A-Za-z0-9]/', '', $post) but it does not work. How can I fix it so that #rock, #music, or #rock. #music. will still output the desired result of rock,music

2

Answers


  1. You can simple use preg_replace to remove all characters and spaces between the tags and then explode it with #.

    Example:

    function getHashtags ($text) {
        $clean = preg_replace("/[^A-Za-z0-9] #/", "#", $text);
        
        $text = explode("#", $clean);
    
        $hashtag = [];
    
        foreach ($text as $word) {
            if ($word){
                $hashtag[]= $word;
            }
        }
        
        return implode(',', $hashtag);
    }
    

    Output should be:

    getHashtags("#rock, #music, is good, #metal, #is not so good");
    => string(40) "rock,music, is good,metal,is not so good"
    
    Login or Signup to reply.
  2. To handle the case where hashtags are separated by non-alphanumeric characters, you can modify the regular expression used to match hashtags. Currently, the regular expression /^[a-zA-Z0-9]+$/ matches only alphanumeric characters.

    You can update it to allow for non-alphanumeric characters that might appear between the ‘#’ symbol and the actual hashtag word. One way to do this is to use a character class that matches any non-space character, like this:

    $hashReg = "/^#[^s]+$/";
    

    Here is the modified getHashtags function:

    function getHashtags($text) {
        // explode on spaces
        $text = explode(" ", $text);
        $hashtags = [];
        $hashReg = "/^#[^s]+$/";
        // for every word in post
        foreach ($text as $word) {
            // 1st character #
            $char = substr($word, 0, 1);
            // word after character #
            $ref = substr($word, 1);
            // if 1st character in word is #
            if ($char == "#") {
                // check if hashtag matches pattern
                if (preg_match($hashReg, $word)) {
                    // check hashtag length
                    if (strlen($ref) <= 11) {
                        // add hashtag to array
                        $hashtags[] = $ref;
                    }
                }
            }
        }
        // join hashtags with comma and return as string
        return implode(",", $hashtags);
    }
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search