I have a function that gets all the hashtag words in a post and outputs the words separated by a comma (because a post can have many hashtags) to be stored in a database column.
function getHashtags ($text) {
// explode on spaces
$text = explode(" ", $text);
$hashtag = "";
$hashReg = "/^[a-zA-Z0-9]+$/";
// for every word in post
foreach ($text as $word) {
// 1st character #
$char = substr($word, 0, 1);
// word after character #
$ref = substr($word, 1);
// if 1st character in word is #
if ($char == "#") {
// check if only letters & numbers
if (preg_match ($hashReg, $ref)) {
// check hashtag length
if (strlen($ref) <= 11) {
// set hashtag
$hashtag .= substr($word, 1).",";
}
}
}
}
return $hashtag;
}
The function works well, e.g
$post = "#rock #music is good";
echo getHashtags($post);
// output: rock,music,
However if the example was $post="#rock, #music, is good"
the comma after #rock
and #music
will make the function not work, this will also happen with any other characters like fullstops, question marks etc. I have tried adding a preg_replace('/[^A-Za-z0-9]/', '', $post)
but it does not work. How can I fix it so that #rock, #music,
or #rock. #music.
will still output the desired result of rock,music
2
Answers
You can simple use preg_replace to remove all characters and spaces between the tags and then explode it with #.
Example:
Output should be:
To handle the case where hashtags are separated by non-alphanumeric characters, you can modify the regular expression used to match hashtags. Currently, the regular expression /^[a-zA-Z0-9]+$/ matches only alphanumeric characters.
You can update it to allow for non-alphanumeric characters that might appear between the ‘#’ symbol and the actual hashtag word. One way to do this is to use a character class that matches any non-space character, like this:
Here is the modified getHashtags function: