In PHP, what is the fastest and simplest way to strip all HTML tags from a string, except the ones in an allowed list but by removing all HTML attributes.
The built-in function strip_tags
would have done the job but the attributes are kept for the tags in the allowed list.
I don’t know if using regular expressions is the best way and I also don’t know if parsing the string wouldn’t be greedy.
2
Answers
A regular expression might fail if an attribute has a
>
as a value of an attribute.A safer way would be to use
DomDocumment
but note that the input should be valid HTML and also the output might possibly be standardized.Output:
First of all,
strip_tags
does not prevent XXS attacks, so from a security perspective I would not recommend it, see here.However, here is an example of the solution I suggested in the comments. The trick is to use a special character to escape your allowed tags. This makes for a straightforward solution, as you can just use
strip_tags
.