Ubuntu - How does the compiler know how to correctly handle bitfields when reading binary data from a file in C?

ChrisBurke
May 21, 2023
191 views
0 votes
2 Answers

I’m trying to learn C by writing a postgres clone and struggling to understand how the compiler handles bitfields.

Specifically, my question surrounds the bitfields in postgres’ line pointer struct:

typedef struct ItemIdData {
  unsigned  lp_off:15,   /* offset to tuple (from start of page) */
        lp_flags:2,  /* state of line pointer, see below */
        lp_len:15;   /* byte length of tuple */
} ItemIdData;

In my clone, I’m able to set these fields to correct values and write them to disk, but I don’t understand how the compiler is able know which of the 15 bits are relevant when converting to an integer type. Here’s a full example showing what I’m doing:

#include <stdlib.h>
#include <stdio.h>
#include <fcntl.h>
#include <sys/stat.h>
#include <unistd.h>

typedef struct ItemIdData {
  unsigned  lp_off:15,   /* offset to tuple (from start of page) */
        lp_flags:2,  /* state of line pointer, see below */
        lp_len:15;   /* byte length of tuple */
} ItemIdData;

int main() {
  ItemIdData* lp = malloc(sizeof(ItemIdData));

  lp->lp_off = 205;
  lp->lp_len = 51;

  int fd = open("line_pointer.data",
                O_RDWR |    // Read/Write mode
                  O_CREAT,  // Create file if it doesn't exist
                S_IWUSR |   // User write permission
                  S_IRUSR   // User read permission
  );

  write(fd, lp, 4);

  close(fd);
  free(lp);

  ItemIdData* lp2 = malloc(sizeof(ItemIdData));

  int fd2 = open("line_pointer.data",
                O_RDWR |    // Read/Write mode
                  O_CREAT,  // Create file if it doesn't exist
                S_IWUSR |   // User write permission
                  S_IRUSR   // User read permission
  );

  read(fd2, lp2, 4);

  printf("lp_off: %dn", lp2->lp_off);
  printf("lp_len: %dn", lp2->lp_len);

  close(fd2);
  free(lp2);

  return EXIT_SUCCESS;
}

Compiling and running the program, I get:

$ gcc -o lp main.c && ./lp
lp_off: 205
lp_len: 51

Using xxd -b, I inspected the binary contents of the data file and see these four bytes:

11001101 00000000 01100110 00000000

lp_off is the first 15 bits: 11001101 0000000, which somehow correctly converts to 205 in decimal

lp_len is the last 15 bits: 1100110 00000000, which somehow correctly converts to 51 in decimal

This is what I’m failing to to understand: how does the compiler know that the trailing 0-bits are not part of the value when converting to an int in the print statements above?

I’m coding this on Ubuntu-20.04 running inside WSL on a Windows 10 machine, if that matters.

Tags: c#

Answers

- user16217248
- May 21, 2023 at 7:22 pm
- 0 votes
0
The bit field allocates 15 bits for lp_off in which 205 is stored and lp_len in which 51 is stored. 205 in binary is 11001101. The extra 0 bits you are seeing are just the higher order bits, not set because the stored value is not big enough.

Given that you are on a little endian machine, the first 8 least significant bits are in the first byte, and the remaining 7 are in the next byte. If you try storing 32767 in the lp_off, you will see all 1 bits for the first 15 bits.

Login or Signup to reply.

- Motomotes
- May 21, 2023 at 8:30 pm
- 0 votes
0
The 32 bits in the struct are stored in little endian as user16217248 says, but it is not true that the first 15 bits in the file will be all ones if offset stores 32767.

Since the struct is 15 bits + 2 bits + 15 bits it may be defined logically like this:

Where o is offset, f is flag, and l is len.
But since it is little endian it will be stored in memory and written to disk like this:

so for instance writing 32767 to offset and length with 0 to flag will yeild:

11111111 01111111 11111110 11111111

because the least significant bit of the flag is stored first in the byte (2nd byte of struct) containing it, and the most significant bit of the flag is stored last in the byte (3rd byte of struct) containing it.

So, if flag = 1 it would have been:

11111111 11111111 11111110 11111111

and flag = 2,

11111111 01111111 11111111 11111111

Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.

Ubuntu – How does the compiler know how to correctly handle bitfields when reading binary data from a file in C?

Answers