skip to Main Content

I understand this might be an undefined behavior question, but I’m curious and also trying to understand the reason for the below results

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

#define SIZE 100

char* memory(){
    char ch[SIZE] = {0};
    return ch;
}

void copy(char *string){
   char *new_string = memory(); 
   strncpy(new_string, string, strlen(string));
}

int main(){
    char *string = "This is going to be copied";
    copy(string);
}

I am encountering a Segmentation Fault from the Kernel. The memory returned by the memory() function goes out of scope once the function ends, but here’s my understanding:
My Understanding:

  1. Virtual Memory and Page Management:
    Since the OS handles memory in pages, if the mapping between virtual memory and physical memory for the pages used by the process remains the same, it should be possible to access all the memory within that page, right? (This memory could be junk or potentially overwritten by other threads within the same process.)
  2. Page-level Reassignment:
    If the OS decides to reassign memory, it would do so at the page level, not at the byte level.
  3. Page Remapping and Segmentation Faults:
    Unless the memory allocated for the char array resides on a different page, and the OS is attempting to remap that page to another process, accessing this memory should not result in a segmentation fault.

My Doubts:

  1. Kernel’s Illegal Memory Access Detection:
    What triggers the Kernel to recognize this as an illegal memory access? Why would it flag this situation if the memory page hasn’t been reassigned yet?
  2. Page Validity:
    Does the Kernel mark entire pages as valid or invalid in its metadata? Is this what indicates whether a page is still in use by the current process?

Environment:
Operating System: Ubuntu 24.04.1 LTS
Kernel: Linux 6.8.0-49-generic
Architecture: x86-64

Optimization Level 0: gcc ptr.c -g -o ptr -O0

gcc –version
gcc (Ubuntu 13.2.0-23ubuntu4) 13.2.0

Please let me know if any of my understanding is incorrect.

2

Answers


  1. With gcc 14.2, unoptimized, on an x86-64 Linux system (try on godbolt), what happens is that the compiler notices that memory() is returning the address of a local object whose lifetime is ending (and warns about it). Since any use of that returned value would cause undefined behavior, the compiler can return any value whatsoever and still conform to the rules of the C language. So it decides to return a null pointer. See line 17 of the asm on godbolt (mov eax, 0).

    When copy() then attempts to call memcpy with a null pointer, an access to page 0 results. Since page 0 certainly is unmapped, the result is a segmentation fault.

    The reasoning in your question is otherwise correct: if memory() really did return the address of the local array ch, then we would indeed expect it to point to a mapped page, and writing to it afterward would not be expected to cause a segfault. (At least, not immediately; if that memory was now being used for something else – say within memcpy – then overwriting it might certainly cause the program to segfault or otherwise misbehave later on.) So the only flaw in your logic is that, because of the undefined behavior at the C language level, memory() was not returning the actual address of ch.

    We can modify the program to force the compiler to return the actual address of ch, by passing it through a volatile pointer:

    char* memory(){
        char ch[SIZE] = {0};
        char * volatile p = ch;
        return p;
    }
    

    (Try on godbolt.). This means the compiler can’t assume that the value read back from p is still the address of ch (even though in fact it is), so it can’t be sure that using that value would cause UB, and so it has to return it as is without messing with it. You will find that indeed, this modified version of the program does not segfault, for exactly the reasons you’ve described.

    Login or Signup to reply.
  2. In addition to the answer by @Nate Eldredge, which answers the main question, you may also get segmentation faults here caused by using the dangerous strncpy function. Which infamously very few programmers understand how it works let alone what it is actually used for (namely ancient Unix fixed length strings from the 1970s). Details here: Is strcpy dangerous and what should be used instead?

    Modified example which is getting rid of the local variable going out of scope and exposing the strncpy bug:

    #include <stdlib.h>
    
    #define SIZE 100
    
    char* memory(){
        static char ch[SIZE] = "                           and the secret password is BAADBEEF";
        return ch;
    }
    
    void copy(char *string){
       char *new_string = memory(); 
       strncpy(new_string, string, strlen(string));
       puts(new_string);
    }
    
    int main(){
        char *string = "This is going to be copied";
        copy(string);
    }
    

    Output:

    This is going to be copied and the secret password is BAADBEEF
    

    With enough strlen to hang yourself, this could as well have been an out of bounds memory access, if there was no tailing byte with value zero in the data as in my hand-crafted example. We could as well have smashed into some stack canary.

    Solution:

    Never use the dangerous strncpy function for any purpose (except Unix software archaeology). In this case, one of the safe strcpy or memccpy functions should have been used instead.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search