I have a simple app programmed in PHP using CodeIgniter 4 framework and, as a web application, it has some HTML forms for user input.
I am doing two things:
-
In my Views, all variables from the database that come from user input are sanitized using CodeIgniter 4’s
esc()
function. -
In my Controllers, when reading HTTP
POST
data, I am using PHP filters:$data = trim($this->request->getPost('field', FILTER_SANITIZE_SPECIAL_CHARS));
I am not sure if sanitizing both when reading data from POST and when printing/displaying to HTML is a good practice or if it should only be sanitized once.
In addition, FILTER_SANITIZE_SPECIAL_CHARS
is not working as I need. I want my HTML form text input to prevent users from attacking with HTML but I want to keep some ‘line breaks’ my database has from the previous application.
FILTER_SANITIZE_SPECIAL_CHARS
will NOT delete HTML tags, it will just store them in the database, not as HTML, but it is also changing my ‘line breaks’. Is there a filter that doesn’t remove HTML tags (only stores them with proper condification) but that respects n
‘line breaks’?
4
Answers
I think for this situation using esc is sufficient. FILTER_SANITIZE_SPECIAL_CHARS is a PHP sanitize filter that encode ‘"<>& and optionally strip or encode other special characters according to the flag. To do that you need to set the flag. It is third parameter in getPost() method. Here is an example
This flag can be change according to your requirements. You can use any PHP filter with a flag. Please refer php documentation for more info.
From my understanding,
FILTER_SANITIZE_SPECIAL_CHARS is used to sanitize the user input before you act on it or store it.
Whereas esc is used to escape HTML etc in the string so they don’t interfere with normal html, css etc. It is used for viewing the data.
So, you need both, one for input and the other for output.
Following from codeigniter.com. Note, it uses the Laminas Escaper library.
From docs.laminas.dev
Some of the functions they do are similar. Such as both may/will convert
<
to<
. However, your stored data may not have come just from user input and it may have<
in it. It is perfectly safe to store it this waybut it needs to be escaped for output otherwise the browser could get confused, thinking its html.
You don’t need to sanitize User input data as explained in the question below:
How can I sanitize user input with PHP?
In addition, you don’t need to use
FILTER_SANITIZE_SPECIAL_CHARS
,htmlspecialchars(...)
,htmlentities(...)
, oresc(...)
either for most use cases:–Comment from OP (user1314836)
The reason why you don’t need the
esc()
method for most use cases is:Most User form input in an application doesn’t expect a User to submit/post HTML, CSS, or JavaScript that you plan on displaying/running later on.
If the expected User input is just plain text (
username
,age
,birth date
, etc), images, or files, use form validation instead to disallow unexpected data.I.e: Available Rules and Creating Custom Rules
By using the Query Builder for your database queries and rejecting unexpected User input data using validation rules (
alpha
,alpha_numeric_punct
,numeric
,exact_length
,min_length[8]
,valid_date
,regex_match[/regex/]
,uploaded
, etc), you can avoid most potential security holes i.e: SQL injections and XSS attacks.Answer from steven7mwesigwa gets my vote, but here is how you should be thinking about it.
Rules Summary
Inputs:
You should strip from all untrusted inputs (user forms, databases that you didn’t write to, XML feeds that you don’t control etc)
This means that many "traditional" cleaning functions are not needed (e.g. Magic Quotes, strip_tags and so on): but you need to know you can handle the code. You should only strip_tags or escape or so on if you know it is pointless having that data in that field.
Note: For user input I prefer to hold the data as the user entered and reject the form allowing them to try again. e.g. If I’m expected a number and I get "hello" then I’ll reload the form with "hello" and tell the user to try again. steven7mwesigwa has links to the validation functions in CI that make that happen.
Outputs:
Choose the correct conversion for the output: and don’t get them muddled up.
Why?
If you do out output conversion on input, then you can easily double-convert an input, or lose track of if you need to make it safe before output, or lose data the user wanted to enter. Mistakes happen but following clean rules will prevent it.
This also mean there is no need to reject special characters (those forms that reject quote marks are horrible user experience, for example, and anyone putting restrictions on what characters can go in a password field are only weakening security)
In your particular case:
You will essentially sanitize each and every output (that you appear to want to avoid), but that’s safer than accidentally missing a sanitize on output and a better user experience than losing stuff they typed.