I want to know how is the bit order of an int32_t is stored in memory, so I write some code to print.
#include <stdio.h>
#include <stdint.h>
#include <assert.h>
void print_i8_as_bits(int8_t num) {
#pragma pack(1)
struct Byte {
unsigned bit_0: 1;
unsigned bit_1: 1;
unsigned bit_2: 1;
unsigned bit_3: 1;
unsigned bit_4: 1;
unsigned bit_5: 1;
unsigned bit_6: 1;
unsigned bit_7: 1;
};
#pragma pack()
assert(sizeof(struct Byte) == 1); // your compiler didn't not support #pragma pack(), try gcc
struct Byte *byte = (struct Byte*) #
printf(
"%u%u%u%u%u%u%u%u",
byte->bit_0,
byte->bit_1,
byte->bit_2,
byte->bit_3,
byte->bit_4,
byte->bit_5,
byte->bit_6,
byte->bit_7
);
}
void print_i32_as_bits(int32_t num) {
int8_t *bytes = (int8_t*) #
for(size_t i = 0; i < 4; ++i) {
print_i8_as_bits(bytes[i]);
printf(" ");
}
}
int main() {
int32_t num = 3;
print_i32_as_bits(num);
printf("n");
return 0;
}
$ gcc --version
gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
$ gcc print_bitfields.c && ./a.out
11000000 00000000 00000000 00000000
$ file ./a.out
./a.out: ELF 64-bit LSB pie executable, x86-64,
version 1 (SYSV), dynamically linked,
interpreter /lib64/ld-linux-x86-64.so.2,
BuildID[sha1]=ec61c8110d755b9b9216325536bcd1bcdc36cf1b,
for GNU/Linux 3.2.0, not stripped
Originally, I wrongly expect the output to be: 00000011 00000000 00000000 00000000
But it’s: 11000000 00000000 00000000 00000000
So I have a search and found this:
With GCC, big endian machines lay out the bits big end first and little endian machines lay out the bits little end first.
But it didn’t have external link.
Can I assume gcc compiled LSB executable imply "Least Significant Byte" and "Least Significant bit" at the same time?
Or it is just a specific behavior of gcc on x86_64 computer?
[Note]: LSB = Least Significant Byte first = little endian
2
Answers
It would make no sense for anything to have least-significant bits print first. This convention is strong and deeply culturally rooted. Just like in the decimal number 12345 means 5 is the least significant digit, there is no debate or confusion as to whether this number could also be represented as 54321. Nobody does that.
Byte order was significant for some hardware architectures; because of how the processor accessed memory, it made sense for a number which took up multiple addresses in memory to have the addresses arranged so that the first one contained the least significant portion.
Just to spell this out, to store the decimal number 12345 on a system with 8-bit bytes and 16-bit integers, you would have to split it up into two bytes. By convention we represent these hexadecimally; the number is 0x3039 and so you break it up into 0x30 and 0x39 (or in binary, 00110000 and 00111001). To store this number at memory address x, you would store 0x30 at address x and 0x39 at x + 1 on a big-endian system, and vice versa on little-endian.
Bit order inside of a byte is not a thing, it’s impossible to determine because the individual bits are not addressable.
Best you can do is the order of bytes inside of a larger scalar, because bytes are addressable.
This behavior of GCC (endiannesss controlling the bit order for bitfields) is arbitrary and doesn’t reveal anything about the hardware.
It’s just convenient, because it means that if you reinterpret the adjacent bitfields as a single scalar (of their underlying type), each bitfield ends up occupying contiguous bits, and the order of bitfield declarations matches the their order in the underlying scalar (or the reverse of that on big endian?).