skip to Main Content

I want replace in a column a string of characters with UTF8 characters, using REGEXP_REPLACE and/or REPLACE.

$search = mysqli_real_escape_string($id_mysql, $_POST['search']);
$new = mysqli_real_escape_string($id_mysql, $_POST['new']);
mysqli_query ($id_mysql,
    "UPDATE table_name
    SET column_name = REGEXP_REPLACE(column_name, '$search', '$new')
    WHERE column_name REGEXP '$search'"
    ) or die(mysqli_error($id_mysql));

With :

$_POST['search'] = 'blabla';
$_POST['new'] = 'x{26A1}';

This inserts the string x{26A1}, not the UTF8 character.

x{26A1} works in $POST[‘search’] but not in $POST[‘new’].

I tried u{26A1}, u26A1, x26A1.

Edit :

To better understand the problem, I have strip mysqli_real_escape_string.

Sorry, x{26A1} works in the search with PHP REGEX, not MySQL REGEX. MySQL use u{26A1}.

(Forget the hexadecimal code, it’s an example.)

With :

$search = "u{264B}";
$new = "u{264C}";

It works.

BUT: if i sent u{264B} or "u{264B}" in the search with the form, it does not work.

$search = $_POST['search'];
$new = "u{264C}";

It seems that with the form and the $_POST, u{264B} becomes a string.

I know double quotes are important. PHP needs double quotes for understand it’s a unicode character. With single quotes, PHP understand it’s a string. I think it’s the key of the problem. But i don’t know how use double quotes with the $_POST. I tried differents ways with no success.

And it’s a simple example. I want use more complex searchs and replacements.

@stackoverflow your related question has nothing to do with my question.

2

Answers


  1. Convert the hex string to unicode character in PHP:

    mb_convert_encoding('&#'.hexdec($_POST['new']), 'UTF-8', 'HTML-ENTITIES');
    

    Or you can use PHP’s Unicode character escaping (assuming you are trying to insert the high voltage character:

    $_POST['new'] = "u{26A1}";
    
    Login or Signup to reply.
  2. We need to know more about the dataflow.

    Assuming the HTML is expecting UTF-8:

    • The end-user should paste the actual character into the <form> <input>.
    • No conversion should be done in PHP
    • The target column in MySQL should be declared to be CHARACTER SET utf8mb4. The COLLATION does not matter (at least for this discussion).
    • Properly escape or bind the character as it is fed to an INSERT statement.

    For example, entering should show E29AA1 when doing SELECT HEX(col)... to see the hex for the character in the table.

    If you want to be "nice" to the user, you could allow them to enter 0xE29AA1 or U+26A1. The former is the UTF-8 (utf8mb4) hex with 0x as a common indicator of such. The latter is the common way to spell the Unicode "codepoint" for that same character. The Ox and U+ would require your PHP code to discover such, convert it to the character, etc.

    In some contexts, u{26A1} will be automatically converted to that single character.

    HTML "entities" — &#x26A1; will show as the character as the page is being rendered. Yes, you could store those 8 characters in the database, with the assumption that the only use for the text is to render it via HTML.

    &#9889; is the decimal equivalent of &#x26A1; — either works equally well on HTML pages.

    See also https://unicode.scarfboy.com/?s=%E2%9A%A1

    Note: %E2%9A%A1 is the "URL encoding" for the character.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search