skip to Main Content

I wrote a function to simulate memcmp, and during testing I realized there is something unusual about memcmp. The source code for the library function is supposed to be:

#include <ansidecl.h>
#include <stddef.h>

int
memcmp(const void *str1, const void *str2, size_t count)
{
  register const unsigned char *s1 = (const unsigned char *)str1;
  register const unsigned char *s2 = (const unsigned char *)str2;

  while (count-- > 0)
    {
      if (*s1++ != *s2++)
        return s1[-1] < s2[-1] ? -1 : 1;
    }
  return 0;
}

mine is

#include <stddef.h>

int ft_memcmp(const void *s1, const void *s2, size_t n)
{
    size_t  i;

    i = 0;
    if (n == 0)
        return (0);
    while(((char *)s1)[i] == ((char *)s2)[i])
    {
        if ((((char *)s1)[i] == '') || (i == (n - 1)))
        {
            return (0);
        }
        i++;
    }
    if (((char *)s1)[i] < ((char *)s2)[i])
        return (-1);
    else
        return (1);
}

the unexpected issue is that, depending on the count provided, the memcmp function starts returning not 1 or -1, but the actual difference from the chars, namely s1[-1] - s2[-2]

Anyone knows why or how this happens?

this is the test I ran that showed me the issue

int main() {
    char *test_strings1[] = { "fdjkDKDJFLDkjdfkjdf", "-456", "ALO marciano!!!", "xc42:", "     7894545989828547", "  +99", "abc123", "12abc", "" };

    char *test_strings2[] = { "fdjkDKDJFLDSkjdfkjdf", "-456", "ALO_ALO marciano!!!", "xc42", "   789454598982854752", "  +99", "abc123", "12abc", "" };

    for (int count = 0; count < 50; count++)
        for (int string = 0; string < 9; string++) {
            int ft = ft_memcmp(test_strings1[string], test_strings2[string], count);
            int lib = memcmp(test_strings1[string], test_strings2[string], count);
       
            if (ft != lib) {
                printf("******Wrong!!!  lib  %i  ft  %i  count = %i  string = %i*********n", lib,  ft, count, string);
            }
        }
    return 0;
}

My GCC version is

gcc --version
Ubuntu clang version 12.0.0-3ubuntu1~20.04.5
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin

My system is Ubuntu 20.04.5 LTS, Intel® Core™ i5-7360U CPU @ 2.30GHz × 4, Mesa Intel® Iris(R) Plus Graphics 640 (Kaby Lake GT3e) (KBL GT3)

this is the output I get, which surprised me. On all the strings that have diference (index 2,3 and 4) the behaviour of memcmp changes if count is greater than 7 (in this version of the output my custom function was returning the difference, not 1,-1 and 0)

Wrong!!! lib -1 ft -63 count = 4 string = 2

Wrong!!! lib -1 ft -23 count = 4 string = 4

Wrong!!! lib -1 ft -63 count = 5 string = 2

Wrong!!! lib +1 ft +58 count = 5 string = 3

Wrong!!! lib -1 ft -23 count = 5 string = 4

Wrong!!! lib -1 ft -63 count = 6 string = 2

Wrong!!! lib +1 ft +58 count = 6 string = 3

Wrong!!! lib -1 ft -23 count = 6 string = 4

Wrong!!! lib -1 ft -63 count = 7 string = 2

Wrong!!! lib +1 ft +58 count = 7 string = 3

Wrong!!! lib -1 ft -23 count = 7 string = 4

2

Answers


  1. From the C Standard (7.24.4.1 The memcmp function)

    Returns

    3 The memcmp function returns an integer greater than, equal to, or
    less than zero,
    accordingly as the object pointed to by s1 is
    greater than, equal to, or less than the object pointed to by s2.

    Login or Signup to reply.
  2. The C Standard (C2x) specifies this:

    7.26.4 Comparison functions

    The sign of a nonzero value returned by the comparison functions memcmp, strcmp, and strncmp is determined by the sign of the difference between the values of the first pair of characters (both interpreted as unsigned char) that differ in the objects being compared.

    […]

    The memcmp function returns an integer greater than, equal to, or less than zero, accordingly as the object pointed to by s1 is greater than, equal to, or less than the object pointed to by s2.

    The posted code for memcmp is just a possible implementation. The C Standard does not specify the exact return value for blocks with different contents, just the sign of the return value. Hence returning s1[-1] - s2[-1] is just as compliant as returning s1[-1] < s2[-1] ? -1 : 1 or 1 - 2 * (s1[-1] < s2[-1])

    Note also that your implementation has problems:

    • it makes no sense to test ((char *)s1)[i] == '') as memcmp does not stop on any null terminator.
    • comparing the byte values as char values is incorrect: the C Standard specifies that the comparison must be performed using the unsigned char values.

    Here is a modified version:

    #include <stddef.h>
    
    int ft_memcmp(const void *s1, const void *s2, size_t n)
    {
        size_t  i;
    
        i = 0;
        if (n == 0)
            return (0);
    
        while (((unsigned char *)s1)[i] == ((unsigned char *)s2)[i])
        {
            i++;
            if (i == n)
            {
                return (0);
            }
        }
        if (((unsigned char *)s1)[i] < ((unsigned char *)s2)[i])
            return (-1);
        else
            return (1);
    }
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search