I wrote a function to simulate memcmp
, and during testing I realized there is something unusual about memcmp
. The source code for the library function is supposed to be:
#include <ansidecl.h>
#include <stddef.h>
int
memcmp(const void *str1, const void *str2, size_t count)
{
register const unsigned char *s1 = (const unsigned char *)str1;
register const unsigned char *s2 = (const unsigned char *)str2;
while (count-- > 0)
{
if (*s1++ != *s2++)
return s1[-1] < s2[-1] ? -1 : 1;
}
return 0;
}
mine is
#include <stddef.h>
int ft_memcmp(const void *s1, const void *s2, size_t n)
{
size_t i;
i = 0;
if (n == 0)
return (0);
while(((char *)s1)[i] == ((char *)s2)[i])
{
if ((((char *)s1)[i] == '') || (i == (n - 1)))
{
return (0);
}
i++;
}
if (((char *)s1)[i] < ((char *)s2)[i])
return (-1);
else
return (1);
}
the unexpected issue is that, depending on the count provided, the memcmp
function starts returning not 1
or -1
, but the actual difference from the chars, namely s1[-1] - s2[-2]
Anyone knows why or how this happens?
this is the test I ran that showed me the issue
int main() {
char *test_strings1[] = { "fdjkDKDJFLDkjdfkjdf", "-456", "ALO marciano!!!", "xc42:", " 7894545989828547", " +99", "abc123", "12abc", "" };
char *test_strings2[] = { "fdjkDKDJFLDSkjdfkjdf", "-456", "ALO_ALO marciano!!!", "xc42", " 789454598982854752", " +99", "abc123", "12abc", "" };
for (int count = 0; count < 50; count++)
for (int string = 0; string < 9; string++) {
int ft = ft_memcmp(test_strings1[string], test_strings2[string], count);
int lib = memcmp(test_strings1[string], test_strings2[string], count);
if (ft != lib) {
printf("******Wrong!!! lib %i ft %i count = %i string = %i*********n", lib, ft, count, string);
}
}
return 0;
}
My GCC version is
gcc --version
Ubuntu clang version 12.0.0-3ubuntu1~20.04.5
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin
My system is Ubuntu 20.04.5 LTS, Intel® Core™ i5-7360U CPU @ 2.30GHz × 4, Mesa Intel® Iris(R) Plus Graphics 640 (Kaby Lake GT3e) (KBL GT3)
this is the output I get, which surprised me. On all the strings that have diference (index 2,3 and 4) the behaviour of memcmp changes if count is greater than 7 (in this version of the output my custom function was returning the difference, not 1,-1 and 0)
Wrong!!! lib -1 ft -63 count = 4 string = 2
Wrong!!! lib -1 ft -23 count = 4 string = 4
Wrong!!! lib -1 ft -63 count = 5 string = 2
Wrong!!! lib +1 ft +58 count = 5 string = 3
Wrong!!! lib -1 ft -23 count = 5 string = 4
Wrong!!! lib -1 ft -63 count = 6 string = 2
Wrong!!! lib +1 ft +58 count = 6 string = 3
Wrong!!! lib -1 ft -23 count = 6 string = 4
Wrong!!! lib -1 ft -63 count = 7 string = 2
Wrong!!! lib +1 ft +58 count = 7 string = 3
Wrong!!! lib -1 ft -23 count = 7 string = 4
2
Answers
From the C Standard (7.24.4.1 The memcmp function)
The C Standard (C2x) specifies this:
The posted code for
memcmp
is just a possible implementation. The C Standard does not specify the exact return value for blocks with different contents, just the sign of the return value. Hence returnings1[-1] - s2[-1]
is just as compliant as returnings1[-1] < s2[-1] ? -1 : 1
or1 - 2 * (s1[-1] < s2[-1])
Note also that your implementation has problems:
((char *)s1)[i] == ' ')
asmemcmp
does not stop on any null terminator.char
values is incorrect: the C Standard specifies that the comparison must be performed using theunsigned char
values.Here is a modified version: