I’m trying to learn C by writing a postgres clone and struggling to understand how the compiler handles bitfields.
Specifically, my question surrounds the bitfields in postgres’ line pointer struct:
typedef struct ItemIdData {
unsigned lp_off:15, /* offset to tuple (from start of page) */
lp_flags:2, /* state of line pointer, see below */
lp_len:15; /* byte length of tuple */
} ItemIdData;
In my clone, I’m able to set these fields to correct values and write them to disk, but I don’t understand how the compiler is able know which of the 15 bits are relevant when converting to an integer type. Here’s a full example showing what I’m doing:
#include <stdlib.h>
#include <stdio.h>
#include <fcntl.h>
#include <sys/stat.h>
#include <unistd.h>
typedef struct ItemIdData {
unsigned lp_off:15, /* offset to tuple (from start of page) */
lp_flags:2, /* state of line pointer, see below */
lp_len:15; /* byte length of tuple */
} ItemIdData;
int main() {
ItemIdData* lp = malloc(sizeof(ItemIdData));
lp->lp_off = 205;
lp->lp_len = 51;
int fd = open("line_pointer.data",
O_RDWR | // Read/Write mode
O_CREAT, // Create file if it doesn't exist
S_IWUSR | // User write permission
S_IRUSR // User read permission
);
write(fd, lp, 4);
close(fd);
free(lp);
ItemIdData* lp2 = malloc(sizeof(ItemIdData));
int fd2 = open("line_pointer.data",
O_RDWR | // Read/Write mode
O_CREAT, // Create file if it doesn't exist
S_IWUSR | // User write permission
S_IRUSR // User read permission
);
read(fd2, lp2, 4);
printf("lp_off: %dn", lp2->lp_off);
printf("lp_len: %dn", lp2->lp_len);
close(fd2);
free(lp2);
return EXIT_SUCCESS;
}
Compiling and running the program, I get:
$ gcc -o lp main.c && ./lp
lp_off: 205
lp_len: 51
Using xxd -b
, I inspected the binary contents of the data file and see these four bytes:
11001101 00000000 01100110 00000000
lp_off
is the first 15 bits: 11001101 0000000
, which somehow correctly converts to 205 in decimal
lp_len
is the last 15 bits: 1100110 00000000
, which somehow correctly converts to 51 in decimal
This is what I’m failing to to understand: how does the compiler know that the trailing 0-bits are not part of the value when converting to an int in the print statements above?
I’m coding this on Ubuntu-20.04 running inside WSL on a Windows 10 machine, if that matters.
2
Answers
The bit field allocates 15 bits for
lp_off
in which 205 is stored andlp_len
in which 51 is stored. 205 in binary is 11001101. The extra 0 bits you are seeing are just the higher order bits, not set because the stored value is not big enough.Given that you are on a little endian machine, the first 8 least significant bits are in the first byte, and the remaining 7 are in the next byte. If you try storing 32767 in the
lp_off
, you will see all 1 bits for the first 15 bits.The 32 bits in the struct are stored in little endian as user16217248 says, but it is not true that the first 15 bits in the file will be all ones if offset stores 32767.
Since the struct is 15 bits + 2 bits + 15 bits it may be defined logically like this:
Where o is offset, f is flag, and l is len.
But since it is little endian it will be stored in memory and written to disk like this:
so for instance writing 32767 to offset and length with 0 to flag will yeild:
11111111 01111111 11111110 11111111
because the least significant bit of the flag is stored first in the byte (2nd byte of struct) containing it, and the most significant bit of the flag is stored last in the byte (3rd byte of struct) containing it.
So, if flag = 1 it would have been:
11111111 11111111 11111110 11111111
and flag = 2,
11111111 01111111 11111111 11111111