skip to Main Content

first time posting here.

I am facing a problem with unpredicted behavior on my PROD server and my local environment.

Here is some background on the situation:
In my application (backend Laravel 7, frontend regular html/javascript) I need to search for entries in a particular table based on JSON data stored in one of the columns:

Table: flights
columns: id, date, passengers, ... pilot_id, second_pilot_id, flight_data, updated_at, created_at

There are flights, that are directly linked to either a pilot or a second pilot via pilot_id or second_pilot_id. That is fine so far, because I can easily query them. However there are also flight entries, where no registered user is doing the entry and they are only represented by a name that is entered. This works only if the name doesn’t contain special characters, in particular the german Umlaute (ö, ä, ü), also doesn’t work for other specials like â or ß or é, è etc. But ONLY ON PROD, on Local everything works even with special characters.

flight_data has the data type "JSON" in my migration files.

$table->json('flight_data') ... 

Now the problem:

On my local environment I can run the following and will get results returned:

... ->where(function($q) use ($r) {
$q->whereRaw("IF(payee = 2, JSON_CONTAINS(flight_data, '{"second_pilotname":"$r"}'), JSON_CONTAINS(flight_data, '{"pilotname":"$r"}'))");
})->...

This will get me my example results without issues, as expected

($r is filled a particular name of a pilot, in my example he is called "Jöhn Düe")

If I run this on my PROD system I will get no retuns. I tracked it down to the JSON_CONTAINS() function, that prevents the results. I also tried playing around with "Joehn Duee", which would be found correctly, so it basically comes down to the german Umlaute (ö, ä, ü) not being handled correctly somehow.

I also tried some SQL statements in phpmyadmin and these are the results:

LOCAL

select id, flight_data, comments, updated_at from logbook where JSON_CONTAINS(flight_data, '{"pilotname": "Juehn Duee"}')

1 result found

select id, flight_data, comments, updated_at from logbook where JSON_CONTAINS(flight_data, '{"pilotname": "Jühn Düe"}')

1 result found

PROD

select id, flight_data, comments, updated_at from logbook where JSON_CONTAINS(flight_data, '{"pilotname": "Juehn Duee"}')

1 result found

select id, flight_data, comments, updated_at from logbook where JSON_CONTAINS(flight_data, '{"pilotname": "Jühn Düe"}')

0 result found

I also checked the raw data that is stored:

PROD:

column data
flight_data {"pilotname":"Ju00fchn Du00fce"}

LOCAL:

column data
flight_data {"pilotname":"Ju00fchn Du00fce"}

So logically the data is transformed. Which is ok, because the data is then shown according to UTF-8 and then correctly displayed ("Jühn Düe")

The problem is, that in the backend I need to compare this data.

The differences are that on my local environment I am using MYSQL 8.0 (it’s a homestead server, so select @@version; => 8.0.23-0ubuntu0.20.04.1) and on PROD (the hosted server) I am seeing "10.3.28-MariaDB-log-cll-lve"

Therefore the difference is clear, MariaDB vs. MYSQL and the handling of german Umlaute.

I tried various things around changing the conversion / charset of the entries, of the database, that all didn’t solve the problem. I searched for quite a while for various similar problems, but most of them resulted in having the data stored not in UTF-8 – which I checked and is the case for me here.

Even querying for the raw data doesn’t work somehow:

The following doesn’t work neither on PROD nor on LOCAL:

select id, flight_data, comments, updated_at from logbook where JSON_CONTAINS(flight_data, '{"pilotname": "Ju00fchn Du00fce"}')

0 results found

Can you help me figuring out what I am missing here?
Obviously it has to do something with the database, what else can I check or do I need to change?

Thanks a lot everybody for your help!

2

Answers


  1. Chosen as BEST ANSWER

    Thank you all for your inputs and response!

    I figured out a different solution for the problem. Maybe it helps someone..

    I went a step back and checked how I am storing the data. I was using json_encode() for that, which created the table contents as shown above. By just using a raw array to save it, it was working then

    $insert->pilotname = ['pilotname' => $request->pilotname];
    

    Somehow the storing of data before was already the issue.


  2. You should use the same software in development that you use in production. The same brand and the same version. Otherwise you risk encountering these incompatible features.

    MariaDB started as a fork of the MySQL project in 2010, and both have been diverging gradually since then. MySQL implements new features, and MariaDB may or may not implement similar features, either by cherry-picking code from the MySQL project or by implementing their own original code. So over time these two projects grow more and more incompatible. At this point, over 10 years after the initial fork, you should consider MariaDB to be a different software product. Don’t count on any part of it remaining compatible with MySQL.

    In particular, the implementation of JSON in MariaDB versus MySQL is not entirely compatible. MariaDB creating their own original code for the JSON data type as an alias for LONGTEXT. So the internal implementation is quite different.

    You asked if there’s something you need to change.

    Since you use MariaDB in production, not MySQL, you should use MariaDB 10.3.28 in your development environment, to ensure compatibility with the database brand and version you use in production.


    I think the problem is a collation issue. Some unicode collations implement character expansions, so ue = ü would be true in the German collation.

    Here’s a test using MySQL 5.7 which is what I have handy (I don’t use MariaDB):

    mysql> select 'Juehn Duee' collate utf8mb4_unicode_520_ci = 'Jühn Düe' as same;
    +------+
    | same |
    +------+
    |    0 |
    +------+
    
    mysql> select 'Juehn Duee' collate utf8mb4_german2_ci = 'Jühn Düe' as same;
    +------+
    | same |
    +------+
    |    1 |
    +------+
    

    As you can see, this has nothing to do with JSON, but it’s just related to string comparisons and which collation is used.

    See the explanation in https://dev.mysql.com/doc/refman/8.0/en/charset-unicode-sets.html in the section "_general_ci Versus _unicode_ci Collations"

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search