In contrast to CHAR, VARCHAR values are stored as a 1-byte or 2-byte
length prefix plus data. The length prefix indicates the number of
bytes in the value. A column uses one length byte if values require no
more than 255 bytes, two length bytes if values may require more than
255 bytes.
Let us consider this problem on EC store case.
Here is how could be defined the product (item) entity (most likely you can understand this code even has not learned the TypeScript):
import { FIXED_CHARACTERS_COUNT_IN_UNIVERSAL_UNIQUE_ID__VERSION_4 } from "fundamental-constants";
type Product = {
readonly ID: Product.ID;
label: string;
price__dollars__withoutTaxes: number;
};
namespace Product {
export type ID = string;
export namespace ID {
export const TYPE: StringConstructor = String;
export const REQUIRED: boolean = true;
export const FIXED_CHARACTERS_COUNT: number = FIXED_CHARACTERS_COUNT_IN_UNIVERSAL_UNIQUE_ID__VERSION_4;
}
export namespace Label {
export const TYPE: StringConstructor = String;
export const REQUIRED: boolean = true;
export const MINIMAL_CHARACTERS_COUNT: number = 2;
export const MAXIMAL_CHARACTERS_COUNT: number = 127;
}
export namespace Price__Dollars__WihtoutTaxes {
export const TYPE: NumberConstructor = Number;
export const REQUIRED: boolean = true;
export const MINIMAL_VALUE: number = 0;
}
}
Both inputted data validation on the frontend side and request data validation at backend same as database definition must obey to above business rules. Particularly, the product label must include from 2 to 127 characters:
Assume that above values are never directly inputted twice at both frontend and backend – instead, it is been referred:
<!-- BAD: the maximal characters count has been HARDCODED -->
<label for"PRODUCT_LABEL--INPUT">Please input 2-127 characters.</label>
<input type="text" maxlen="127" id="PRODUCT_LABEL--INPUT" />
<!-- GOOD: the maximal characters count has been referred (no matter what is the template engine)-->
<label for"PRODUCT_LABEL--INPUT">Please input {{ Product.Label.MINIMAL_CHARACTERS_COUNT }}-{{ Product.Label.MAXIMAL_CHARACTERS_COUNT }} characters.</label>
<input type="text" maxlen="{{ Product.Label.MAXIMAL_CHARACTERS_COUNT }}" id="PRODUCT_LABEL--INPUT" />
When defining the database (now matter, how exactly – via raw SQL request, GUI tool or ORM), we also will set the VARCHAR-like type for the label
column of the Products
tables with 127 characters maximal length (again, by referring to Product.Label.MAXIMAL_CHARACTERS_COUNT
instead of direct input of 127
value).
Then, assume that the seller has inputted the product label consists almost 127 characters, but including 2-byte ones. Validation on the frontend has not been threat the inputted value same as validation of request data at the backend. But once the server application will try to save the added (or updated) product to the table, we’ll get the exception about label
‘s value is exceeding the maximal length!
Question: which value must be set in Product.Label.MAXIMAL_CHARACTERS_COUNT
?
(Let me repeat that this value is being referred from both frontend and the backend).
2
Answers
The length argument for the
VARCHAR(L)
column specifies how many characters can be saved. This does not include the number of additional bytes needed for the MySQL database to store the value in the table. The quoted documentation only specifies how many bytes are needed additionally to store a value in aVARCHAR
column. See the following example:As you see it is possible to save the string
123456789
(length of nine) in theVARCHAR(10)
column, since 9<=10. It will however require additional 1 byte to save the data.When you try to save the string
1234567890
(length of ten) in theVARCHAR(10)
column, it will works as well since 10<=10. Again, it needs additional 1 byte for the length of the string.The value
12345678901
cannot be saved since the string has a length of eleven and is too big to save in a column of typeVARCHAR(10)
.So when you want to save only labels with a maximum length of 127, then use
VARCHAR(127)
. A user will be able to save values with a string up to a length of 127, but no bigger strings.Keep in mind that the data is stored as characters, not bytes. This means that the value
äöüäöüäöü
(nine umlauts) can be saved in aVARCHAR(10)
column, since 9<=10, even though 18+1 bytes are needed to save the data in the table. See the followingSELECT
statement:Your quote from the MySQL documentation at the top seems to show that you’re conflating two concerns. The "2-byte length prefix" the documentation refers to is just a number stored at the beginning of every varchar column value which represents the length of the string contained within that column. For your purposes, it’s not something you really need to be thinking about.
At least from my understanding of your question, the 2-byte values that you seem to concerned about would actually be Unicode characters within the text which require multiple bytes to be represented (and it is worth noting, that there are plenty of Unicode characters out there that require significantly more than two bytes as well).
As a general rule of thumb, you should consider all of your character limits in terms of actual unicode character units, rather than as e.g. byte limits — e.g. if I have 10 unicode characters that each require 4 bytes to store, I should be at 10/127 of your character limit, not 40/127.
This is how MySQL works, assuming you’re on a version > 5, and have your table configured to use UTF-8 (docs):
However, this is not how
maxlength
andminlength
in HTML work — they measure in single UTF-16 code units (so essentially two bytes per character), so if you have e.g. a large emoji they will, out-of-the-box, not count it correctly:Since you want to use the same constant to represent length across both the backend and the frontend, you will probably have to use JavaScript to accurately count the unicode characters on your frontend; so that they are both measured in terms of Unicode character units.