skip to Main Content

In PhPMyAdmin:

Consider a table containing two strings:

D:PROJECTSARJ Trans Infinite Galaxy210514Infinite Galaxy-Batch28.xlsx.sdlxliff
Infinite Galaxy-Batch28.xlsx.sdlxliff

Run the following queries:

SELECT * FROM test WHERE name='D:\PROJECTS\ARJ Trans Infinite Galaxy\210514\Infinite Galaxy-Batch28.xlsx.sdlxliff';
SELECT * FROM test WHERE name LIKE '%D:\PROJECTS\ARJ Trans Infinite Galaxy\210514\Infinite Galaxy-Batch28.xlsx.sdlxliff%'; 
SELECT * FROM test WHERE name LIKE '%Infinite Galaxy-Batch28.xlsx.sdlxliff%';  
SELECT * FROM test WHERE name LIKE '%Galaxy-Batch28.xlsx.sdlxliff%'; 
SELECT * FROM test WHERE name LIKE '%Infinite Galaxy%'; 

Only these queries return records:

SELECT * FROM test WHERE name LIKE '%Galaxy-Batch28.xlsx.sdlxliff%'
SELECT * FROM test WHERE name LIKE '%Infinite Galaxy%'

Why doesn’t

SELECT * FROM test WHERE name LIKE '%Infinite Galaxy-Batch28.xlsx.sdlxliff%'; 

for example find anything?

I’m baffled, either completely stupid or overlooking something really basic. I escaped everything that needs escaping, I think. There are no non-breaking spaces, no enters, no hidden characters. I tried again making a new table from scratch and even verified all strings in both PhPMyAdmin using hex. There is no difference.

Table dump:

-- phpMyAdmin SQL Dump
-- version 5.1.0
-- https://www.phpmyadmin.net/
--
-- Host: 127.0.0.1
-- Gegenereerd op: 16 mei 2021 om 12:51
-- Serverversie: 10.4.18-MariaDB-log
-- PHP-versie: 8.0.3

SET SQL_MODE = "NO_AUTO_VALUE_ON_ZERO";
START TRANSACTION;
SET time_zone = "+00:00";


/*!40101 SET @OLD_CHARACTER_SET_CLIENT=@@CHARACTER_SET_CLIENT */;
/*!40101 SET @OLD_CHARACTER_SET_RESULTS=@@CHARACTER_SET_RESULTS */;
/*!40101 SET @OLD_COLLATION_CONNECTION=@@COLLATION_CONNECTION */;
/*!40101 SET NAMES utf8mb4 */;

--
-- Database: `modernmt`
--

-- --------------------------------------------------------

--
-- Tabelstructuur voor tabel `test`
--

CREATE TABLE `test` (
  `id` int(11) NOT NULL,
  `name` varchar(500) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;

--
-- Gegevens worden geëxporteerd voor tabel `test`
--

INSERT INTO `test` (`id`, `name`) VALUES
(1, 'D:\PROJECTS\ARJ Trans Infinite Galaxy\210514\Infinite Galaxy-Batch28.xlsx.sdlxliff'),
(2, 'Infinite Galaxy-Batch28.xlsx.sdlxliff');

--
-- Indexen voor geëxporteerde tabellen
--

--
-- Indexen voor tabel `test`
--
ALTER TABLE `test`
  ADD PRIMARY KEY (`id`);

--
-- AUTO_INCREMENT voor geëxporteerde tabellen
--

--
-- AUTO_INCREMENT voor een tabel `test`
--
ALTER TABLE `test`
  MODIFY `id` int(11) NOT NULL AUTO_INCREMENT, AUTO_INCREMENT=3;
COMMIT;

/*!40101 SET CHARACTER_SET_CLIENT=@OLD_CHARACTER_SET_CLIENT */;
/*!40101 SET CHARACTER_SET_RESULTS=@OLD_CHARACTER_SET_RESULTS */;
/*!40101 SET COLLATION_CONNECTION=@OLD_COLLATION_CONNECTION */;

In the small table this now works, but in the original table, where this is an extract from, it does not, leading me to think that there’s a difference between the other table and this small table I created for testing. The original table has more than 1.000.000 reccords though with confidential information. not something I can post here. I’ve tried hexing already. What else may be causing the very same = statement to find absolutely nothing in the other table?

See these pictures:
The record in question

The record does not show

We dive deeper:

SELECT source FROM engdut WHERE file LIKE 'D:\\PROJECTS\\ARJ Trans Infinite Galaxy\\210514\\Infinite Galaxy-Batch28.xlsx.sdlxliff%';

…shows all records I need.

SELECT source FROM engdut WHERE file LIKE 'D:\\PROJECTS\\ARJ Trans Infinite Galaxy\\210514\\Infinite Galaxy-Batch28.xlsx.sdlxliff';

…shows 0 records.

SELECT source FROM engdut WHERE file = 'D:\PROJECTS\ARJ Trans Infinite Galaxy\210514\Infinite Galaxy-Batch28.xlsx.sdlxliff';

…shows 0 records.

I took one record and edited the file name manually, by copying

D:\PROJECTS\ARJ Trans Infinite Galaxy\210514\Infinite Galaxy-Batch28.xlsx.sdlxliff

straight from the above query. Same result. The extra % at the end of the first LIKE query seems to do the trick, but why? It shouldn’t be needed. There is nothing at the end of said file name. Not an enter, nothing. As said, I double-checked by emptying the field and copying the file name itself straight from the query. I then tried once more by emptying the field and typing the file name, character by character, manually and triple-checking it. Same result.

I never bumped into this issue before. Like I said, this database has more than a million records. It only happens with double extensions. If I delete the sdlxliff at the end and simply look for

= D:\PROJECTS\ARJ Trans Infinite Galaxy\210514\Infinite Galaxy-Batch28.xlsx

I find exactly what I need.

I strongly suspect there’s a combination of characters here that MySQL cannot handle.

SELECT file, hex(file), source FROM engdut WHERE file LIKE 'D:\\PROJECTS\\ARJ Trans Infinite Galaxy\\210514\\Infinite Galaxy-Batch28.xlsx.sdlxliff%'

Gives the following for the found file name:

443A5C50524F4A454354535C41524A205472616E7320496E66696E6974652047616C6178795C3231303531345C496E66696E6974652047616C6178792D426174636832382E786C73782E73646C786C696666

That’s two f’s at the end. There are no special characters there. MySQL itself says so.

2

Answers


  1. Chosen as BEST ANSWER

    Solved! The database contents were correct: the file name didn't contain any strange characters.

    The query however wasn't correct. That query pulled the file name from the file being read, and that part of the code apparently did not strip all characters. This explains the inconsistent results.

    When I wrote the actual queries to a separate text file and analyzed them in 010 Editor, this is what I got:

    Exactly, that's a non-breaking space instead of an actual space in the file name

    Apparently the code neatly stripped these before putting the file names in the database, but for this particular query, the file name wasn't stripped. I.e. the database was correct, but this particular query wasn't.

    I solved this by using the following before querying any file names, ever, in the database:

    function escapeFileStringToSQL($string)
    {
        $string = str_replace('\', '\\', $string);
        $string = str_replace("'", "'", $string);
        $string = str_replace('"', '"', $string);
        // Never ever search the database for file names containing non-breaking spaces, as these have been stripped in the database
        $string = preg_replace("/(?:xc2xa0)/", "x20", $string);
        return $string;
    }
    

    I wish I could vote for everyone who commented, but unless I'm mistaken, you can only vote for actual Answers. If I'm wrong, please let me know and I will correct. Thank you all very much for taking you time to look into this.

    Everybody thought about the database, but no one thought about the query itself. PhPMyAdmin not rendering non-breaking spaces differently didn't really help either. Maybe some of the info in the original question was wrong, which may have been misleading... then again, once you start making mistakes during the debugging itself, thinks become very complicated.


  2. In MySQL, backslashes have special meaning in string literals. Backslash also has special meaning to the LIKE patterns. So if you want to search for a literal backslash character with LIKE, you must give it two backslashes. But MySQL string literals also need to escape the backslash character, so you actually need four.

    Here’s a test in the MySQL command-line client:

    mysql> SELECT * FROM test WHERE name LIKE 'D:\\PROJECTS\\ARJ Trans Infinite Galaxy\\210514\\Infinite Galaxy-Batch28.xlsx.sdlxliff';
    +----+------------------------------------------------------------------------------------+
    | id | name                                                                               |
    +----+------------------------------------------------------------------------------------+
    |  1 | D:PROJECTSARJ Trans Infinite Galaxy210514Infinite Galaxy-Batch28.xlsx.sdlxliff |
    +----+------------------------------------------------------------------------------------+
    

    LIKE has optional syntax to change its escape character, in case you want to use something other than backslash. This is documented here: https://dev.mysql.com/doc/refman/8.0/en/string-comparison-functions.html#operator_like

    So we can get this down to two backslashes for each case:

    mysql> SELECT * FROM test WHERE name LIKE 'D:\PROJECTS\ARJ Trans Infinite Galaxy\210514\Infinite Galaxy-Batch28.xlsx.sdlxliff' ESCAPE '|';
    +----+------------------------------------------------------------------------------------+
    | id | name                                                                               |
    +----+------------------------------------------------------------------------------------+
    |  1 | D:PROJECTSARJ Trans Infinite Galaxy210514Infinite Galaxy-Batch28.xlsx.sdlxliff |
    +----+------------------------------------------------------------------------------------+
    

    You can also turn off the special meaning for backslashes in string literals. This is documented here: https://dev.mysql.com/doc/refman/8.0/en/sql-mode.html#sqlmode_no_backslash_escapes and here: https://dev.mysql.com/doc/refman/8.0/en/string-literals.html

    mysql> set sql_mode='NO_BACKSLASH_ESCAPES';
    Query OK, 0 rows affected (0.00 sec)
    
    mysql> SELECT * FROM test WHERE name LIKE 'D:PROJECTSARJ Trans Infinite Galaxy210514Infinite Galaxy-Batch28.xlsx.sdlxliff' ESCAPE '|';
    +----+------------------------------------------------------------------------------------+
    | id | name                                                                               |
    +----+------------------------------------------------------------------------------------+
    |  1 | D:PROJECTSARJ Trans Infinite Galaxy210514Infinite Galaxy-Batch28.xlsx.sdlxliff |
    +----+------------------------------------------------------------------------------------+
    

    It’s possible that phpMyAdmin does further filtering of backslashes before it sends the string to the MySQL Server, but I don’t know about that. I use the MySQL command-line interface, not phpMyAdmin.

    Frankly, I will never forgive Microsoft for using the backslash metacharacter as their directory separator.

    Modern versions of Windows do allow forward-slash (/) to be used interchangeably with backlash as a directory separator. I suggest you use forward-slash in your paths to reduce the confusion about metacharacters.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search