skip to Main Content

I’m taking the Meta Data Engineer Professional Certificate and I was just given this prompt in a lab:

Mr. Carl needs to have a new table to store the contact details of each customer including customer account number, customer phone number and customer email address.
You are required to choose a relevant data type for each of the columns.

Solution:
Account number: INTEGER
Phone number: INTEGER
Email: VARCHAR

Prior to reading the solution I selected VARCHAR(10) as the datatype for storing phone numbers as I thought they should be treated as string data. My reasoning is that there’s no reason to perform any sort of mathematical operation on a phone number, and they’re often typed with other characters like "(" or "-".

Is there any compelling reason for storing a phone number as an INT? Do you agree with the solution to this prompt? What is the best practice for storing phone numbers?

3

Answers


  1. Storing phone numbers as strings can be a disaster, the first things coming up to my mind are:

    • You can get dupes easily, maybe someone types the number with (
      and/or – and another user does type the same number without those
      characters, long story short you end up with a duplicate.

    • Thinking about a way to normalize the phone number using an integer
      makes too more sense in terms of normalization and non duplication.

    • Also think about a search with the scenario above, what would you use ? a like a numeric operator ? spread casts ? Messy…

    • Now comes the important thing and it is related to the indexing, the
      int will be faster. The longer is the varchar the slower it gets
      however you are limiting its length.

    The validation can be on the UI with a field mask, or using a regex on the logic whatever makes more sense for you.

    Hope i helped a little bit 🙂

    Login or Signup to reply.
  2. In my opinion, there is no absolute best choice in this. Both have pros and cons. Personally, I’m in favor of using varchar.

    Though special characters like hyphen can cause dupes when mishandled (it’s a rare case and it’s the user to blame as it’s required to verify the input before submitting),it does have the merit of formatting the phone which improves the readability. e.g area_code-tel: xxx-xxxxxxxx (without it it’s near impossible to separate the area code and the phone number as both can have a varied length).

    About indexing,though numerics does have advantages over strings, I’m not sure if a phone number would be used as an index. There are more worthy candidates such as ID or date, but what would a phone number do? Usually we look for the phone based on indexed column such as ID, but how often do we get something based on phone number? Unless we want to list all phones from a particular area, we don’t really need it to be indexed. Then it actually would be more fitting to use special characters like hyphen to help determine the area part.

    P.S Like Ken White kindly suggested, there are cases when phone numbers should be indexed, especially when they are more suitable to be an identifier.

    Login or Signup to reply.
  3. Is "Meta Data Engineer Professional Certificate" aimed at MySQL?

    General Professional: If not MySQL-specific, then you need to understand that "INTEGER" is implemented in different ways by different database engines.

    MySQL Professional: INTEGER, in MySQL, maps to INT SIGNED, which is limited to about 2 billion–That is only 9 digits. I don’t know what the max phone number is worldwide, but I know that 10 is needed.

    BIGINT gives you about 18 digits (in 8 bytes), but that seems silly. For the reasons already mentioned VARCHAR(...) is reasonable. (Perhaps a limit of 20 would be quite sufficient.) In that case, a 10-digit number would take 11 bytes (1 for length, plus 10 for the number.)

    Arguably, you could say, for example DECIMAL(15) to allow up to 15 digits in a 7-byte column.

    (I prefer VARCHAR, in spite of it taking the most space.)

    Either way: It is a bad test question if it does not understand the two cases I present here.

    Non digits: ‘typed with other characters like "(" or "-"’ — That brings up a different issue. It comes under the general heading of GIGO. Cleanse the data before storing it into the database.

    If you ever needed to compare two phone numbers for equality, you would wish you had removed all non-digits. (Or added them in some canonical way, such as US: "(800)543-1212"

    User input: If you ever create a UI for entering phone numbers, dates, SSNs, (or other numbers with some structure), DO NOT require the user to follow some punctuation rules. DO allow a variety of typical formats. (OK, Dates are tricky because there are incompatible orderings. But what if I type "1-1-2021", will you spit at me not having the leading zeroes?

    Indexing: VARCHAR, DECIMAL, INT, etc are all indexable. Any speed difference is not significant.

    Extensions: Without VARCHAR, how would you represent the "extension" in "(800)543-1212×543"? Might this point be the deciding factor in favor of VARCHAR? And you should write a bug report against that ‘Certification’ test?

    Duplicate?: Which is best data type for phone number in MySQL and what should Java type mapping for it be? covers most of what I have said, and hints that [perhaps] VARCHAR(20) is sufficient. (The quoted 15, excludes the international prefix.)

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search