I am little bit confused on binary files, i know the datas are stored in chunks in binary files, and from my knowledge through experimenting i found that if we had a struct with member variables like this:
struct student{
int Roll_No;
char Name[10];
}
Then after updating the variables with contents, and saving it in a binary file the binary file is of 14 bytes, 10 bytes of char and 4 of int, so if we analyze the file in a hexeditor the file has 4 bytes reserved for Roll_no and 10 bytes reserved for Name in which the filled contents are filled and others can be seen as dots in the file, i mean if we create a program with struct/class like above, and after saving contents to the file, the file’s size is the same as we created structure, i mean 4 of int and 10 of char, so from my knowledge if i created a new image format eg. (Dot).MyIMG, from my program which’s stucture/class is like this
struct MyIMG{
char Header[5];
int width, height;
int Pixels[124000];
}
Then my program will create a new file of size 49613 bytes or 49 Kigabytes (which is 5 of header, +(plus) 8 of int height and width, +(plus) 4×124000 of int pixels), wether the pixels are 4, 8, 100, or whatever it will write the whole Pixels array wether empty, so why this effect cannot be same on any large softwares like MSpaint, Adobe photoshop, what do they do, which make their program to write files which’s size depends on the pixels stored inn not the blank arrays…
EDIT: I have now edited my question, and clearly defined my question, pls help me, thanks in advance!!
2
Answers
The code writing the file has to choose it’s own format. For example, when writing your
student
structure to file, you could say something like:This would then write the name up to and including the first 0/NUL character to the binary file. When reading the file back, the program could read a block of data from the
ifstream
then – knowing that astudent
is stored at some offset, usestrlen()
on the.Name
part of the incoming data to recover the length, partly so it can only copy the necessary data to astudent
object, and also know where to start parsing the next data item from the input stream:As you can see – it’s a bit of a pain to scan for the single NUL while keeping track of the amount of data read from the file so you don’t crash if you get a corrupt input file….
For a beginner, it’s probably easiest and far more robust to learn about the boost serialisation library which abstracts much of the low level – some would say C-style – I/O, casting and offset calculations to provide a cleaner logical interface for you.
File formats like .png and .bmp have a specific format. File formats can either specify a layout of bytes (such as 4 bytes for the width, 4 bytes for the height, 2MB of RGBA pixel data, or whatever), or the format may give you information about the size of various objects.
For example, a TIFF file will specify that there are a number tags at specific byte offsets within the file. Those tags then contain information about the size, location, and format of the image data. So you might have a fixed-sized header that says “there is a list of tags starting at byte 100, and it contains 40 tags.” The tags would each be a fixed size (say 16-bytes), so you’d know to read 40 16-byte chunks starting at byte 100. The tags would then contain information such as the byte offset of the start of the image data, how many bytes are in a pixel, and how many pixels there are. From this, you can read the data without knowing ahead of time what the entire format is.