skip to Main Content

Is the following valid HTML5?

<p>1 < 2</p>
<p>2 > 1</p>

Or must this always be written using HTML5 entities like this?

<p>1 &lt; 2</p>
<p>2 &gt; 1</p>

Can someone help me answer this question with references to the HTML5 specification that clearly spells out whether or not it is valid to write < and > (spaces around the symbols) in HTML?

2

Answers


  1. This is valid

    <p>1 &lt; 2</p>
    <p>2 &gt; 1</p>
    

    The first line below is TOLERATED in all browsers

    <p>1 < 2</p>
    <p>2 > 1</p>
    

    but is in principle an invalid-first-character-of-tag-name error

    This error occurs if the parser encounters a code point that is not an ASCII alpha where first code point of a start tag name or an end tag name is expected. If a start tag was expected such code point and a preceding U+003C (<) is treated as text content, and all content that follows is treated as markup. Whereas, if an end tag was expected, such code point and all content that follows up to a U+003E (>) code point (if present) or to the end of the input stream is treated as a comment.

    The w3org parser will flag the < as invalid.

    Here are other related issues

    <script>
      const htmlString = `</script>` // this will fail without escaping or using entities
    </script>
    

    and

    <textarea>
      Here is an end tag: </textarea>
    </textarea>
    

    and

    <p>
      Here is an end tag: </p>
    </p>
    

    Restrictions

    13.1.2.6 Restrictions on the contents of raw text and escapable raw text elements
    The text in raw text and escapable raw text elements must not contain any occurrences of the string </ (U+003C LESS-THAN SIGN, U+002F SOLIDUS) followed by characters that case-insensitively match the tag name of the element followed by one of U+0009 CHARACTER TABULATION (tab), U+000A LINE FEED (LF), U+000C FORM FEED (FF), U+000D CARRIAGE RETURN (CR), U+0020 SPACE, U+003E GREATER-THAN SIGN (>), or U+002F SOLIDUS (/).

    Login or Signup to reply.
  2. > in intended text content is and always had been safe and valid in HTML, even without spacing.

    < is technically invalid when it does not constitute tag in context where tags are expected. It switches parser state to state that expect valid tag name. Like all errors in HTML, it is recoverable, so the moment parser sees any Bad character after <, it rolls back to text content and adds the < and that "bad character" into it, and proceeds further. So it is displayed the same way if it was encoded as &lt;

    Testing

    <!doctype html><html lang="en">
    
    <title>a > b < c</title>
    
    <p>a > b < c</p>
    
    <textarea>a > b < c</textarea>
    

    in https://validator.w3.org/nu/#textarea yelds:

    Error: Bad character  after <.
    Probable cause: Unescaped <.
    Try escaping it as `&lt;`. At line 5, column 11
    

    N.B. in title and textarea < is OK, since there cannot be any nested non-text nodes (not even comments).

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search