skip to Main Content

In contrast to CHAR, VARCHAR values are stored as a 1-byte or 2-byte
length prefix plus data. The length prefix indicates the number of
bytes in the value. A column uses one length byte if values require no
more than 255 bytes, two length bytes if values may require more than
255 bytes.

MySQL official documentation

Let us consider this problem on EC store case.
Here is how could be defined the product (item) entity (most likely you can understand this code even has not learned the TypeScript):

import { FIXED_CHARACTERS_COUNT_IN_UNIVERSAL_UNIQUE_ID__VERSION_4 } from "fundamental-constants";


type Product = {
  readonly ID: Product.ID;
  label: string;
  price__dollars__withoutTaxes: number;
};


namespace Product {

  export type ID = string;
  export namespace ID {
    export const TYPE: StringConstructor = String;
    export const REQUIRED: boolean = true;
    export const FIXED_CHARACTERS_COUNT: number = FIXED_CHARACTERS_COUNT_IN_UNIVERSAL_UNIQUE_ID__VERSION_4;
  }

  export namespace Label {
    export const TYPE: StringConstructor = String;
    export const REQUIRED: boolean = true;
    export const MINIMAL_CHARACTERS_COUNT: number = 2;
    export const MAXIMAL_CHARACTERS_COUNT: number = 127;
  }

  export namespace Price__Dollars__WihtoutTaxes {
    export const TYPE: NumberConstructor = Number;
    export const REQUIRED: boolean = true;
    export const MINIMAL_VALUE: number = 0;
  }

}

Both inputted data validation on the frontend side and request data validation at backend same as database definition must obey to above business rules. Particularly, the product label must include from 2 to 127 characters:

enter image description here

Assume that above values are never directly inputted twice at both frontend and backend – instead, it is been referred:

<!-- BAD: the maximal characters count has been HARDCODED -->
<label for"PRODUCT_LABEL--INPUT">Please input 2-127 characters.</label>
<input type="text" maxlen="127" id="PRODUCT_LABEL--INPUT" />

<!-- GOOD: the maximal characters count has been referred (no matter what is the template engine)-->
<label for"PRODUCT_LABEL--INPUT">Please input {{ Product.Label.MINIMAL_CHARACTERS_COUNT }}-{{ Product.Label.MAXIMAL_CHARACTERS_COUNT }} characters.</label>
<input type="text" maxlen="{{ Product.Label.MAXIMAL_CHARACTERS_COUNT }}" id="PRODUCT_LABEL--INPUT" />

When defining the database (now matter, how exactly – via raw SQL request, GUI tool or ORM), we also will set the VARCHAR-like type for the label column of the Products tables with 127 characters maximal length (again, by referring to Product.Label.MAXIMAL_CHARACTERS_COUNT instead of direct input of 127 value).

Then, assume that the seller has inputted the product label consists almost 127 characters, but including 2-byte ones. Validation on the frontend has not been threat the inputted value same as validation of request data at the backend. But once the server application will try to save the added (or updated) product to the table, we’ll get the exception about label‘s value is exceeding the maximal length!

Question: which value must be set in Product.Label.MAXIMAL_CHARACTERS_COUNT?
(Let me repeat that this value is being referred from both frontend and the backend).

2

Answers


  1. The length argument for the VARCHAR(L) column specifies how many characters can be saved. This does not include the number of additional bytes needed for the MySQL database to store the value in the table. The quoted documentation only specifies how many bytes are needed additionally to store a value in a VARCHAR column. See the following example:

    mysql> CREATE TABLE Dummy (Label VARCHAR(10));
    Query OK, 0 rows affected (0.02 sec)
    
    mysql> INSERT INTO Dummy(Label) VALUES('12345');
    Query OK, 1 row affected (0.01 sec)
    
    mysql> INSERT INTO Dummy(Label) VALUES('123456789');
    Query OK, 1 row affected (0.01 sec)
    
    mysql> INSERT INTO Dummy(Label) VALUES('1234567890');
    Query OK, 1 row affected (0.00 sec)
    
    mysql> INSERT INTO Dummy(Label) VALUES('12345678901');
    ERROR 1406 (22001): Data too long for column 'Label' at row 1
    

    As you see it is possible to save the string 123456789 (length of nine) in the VARCHAR(10) column, since 9<=10. It will however require additional 1 byte to save the data.

    When you try to save the string 1234567890 (length of ten) in the VARCHAR(10) column, it will works as well since 10<=10. Again, it needs additional 1 byte for the length of the string.

    The value 12345678901 cannot be saved since the string has a length of eleven and is too big to save in a column of type VARCHAR(10).

    So when you want to save only labels with a maximum length of 127, then use VARCHAR(127). A user will be able to save values with a string up to a length of 127, but no bigger strings.

    Keep in mind that the data is stored as characters, not bytes. This means that the value äöüäöüäöü (nine umlauts) can be saved in a VARCHAR(10) column, since 9<=10, even though 18+1 bytes are needed to save the data in the table. See the following SELECT statement:

    mysql> SELECT Label, LENGTH(Label) FROM Dummy;
    +--------------------+---------------+
    | Label              | LENGTH(Label) |
    +--------------------+---------------+
    | 12345              |             5 |
    | 123456789          |             9 |
    | 1234567890         |            10 |
    | äöüäöü             |            12 |
    | äöüäöüöäü          |            18 |
    +--------------------+---------------+
    5 rows in set (0.00 sec)
    
    mysql> EXPLAIN Dummy;
    +-------+-------------+------+-----+---------+-------+
    | Field | Type        | Null | Key | Default | Extra |
    +-------+-------------+------+-----+---------+-------+
    | Label | varchar(10) | YES  |     | NULL    |       |
    +-------+-------------+------+-----+---------+-------+
    1 row in set (0.00 sec)
    
    Login or Signup to reply.
  2. Your quote from the MySQL documentation at the top seems to show that you’re conflating two concerns. The "2-byte length prefix" the documentation refers to is just a number stored at the beginning of every varchar column value which represents the length of the string contained within that column. For your purposes, it’s not something you really need to be thinking about.

    At least from my understanding of your question, the 2-byte values that you seem to concerned about would actually be Unicode characters within the text which require multiple bytes to be represented (and it is worth noting, that there are plenty of Unicode characters out there that require significantly more than two bytes as well).

    As a general rule of thumb, you should consider all of your character limits in terms of actual unicode character units, rather than as e.g. byte limits — e.g. if I have 10 unicode characters that each require 4 bytes to store, I should be at 10/127 of your character limit, not 40/127.

    This is how MySQL works, assuming you’re on a version > 5, and have your table configured to use UTF-8 (docs):

    For definitions of character string columns (CHAR, VARCHAR, and the TEXT types), MySQL interprets length specifications in character units.

    However, this is not how maxlength and minlength in HTML work — they measure in single UTF-16 code units (so essentially two bytes per character), so if you have e.g. a large emoji they will, out-of-the-box, not count it correctly:

    <p>You can't type any additional characters into this box, as
    the flag emojis  use 4 code points each:</p>
    <input type="text" value="🇦🇺" maxlength="4">

    Since you want to use the same constant to represent length across both the backend and the frontend, you will probably have to use JavaScript to accurately count the unicode characters on your frontend; so that they are both measured in terms of Unicode character units.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search