skip to Main Content

I need to learn how to update a file concurrently without blocking other threads. Let me explain how it should work, needs, and how I think it should be implemented, then I ask my questions:

Here is how the worker works:

  • Worker is multithreaded.
  • There is one very large file (6 Terabyte).
  • Each thread is updating part of this file.
  • Each write is equal to one or more disk blocks (4096 bytes).
  • No two worker write at same block (or same group of blocks) at the same time.

Needs:

  • Threads should not block other blocks (no lock on file, or minimum possible number of locks should be used)
  • In case of (any kind of) failure, There is no problem if updating block corrupts.
  • In case of (any kind of) failure, blocks that are not updating should not corrupts.
  • If file write was successful, we must be sure that it is not buffered and be sure that actually written on disk (fsync)
  • I can convert this large file to as many smaller files as needed (down to 4kb files), but I prefer not to do that. Handling that many files is difficult, and needs a lot of file handles open/close operations, which has negative impact on performance.

How I think it should be implemented:

I’m not much familiar with file manipulation and how it works at operating system level, but I think writing on a single block should not corrupt other blocks when errors happen. So I think this code should perfectly work as needed, without any change:

char write_value[] = "...4096 bytes of data...";
int  write_block   = 12345;

int block_size = 4096;

FILE *fp;
fp = fopen("file.txt","w+");

fseek(fp, write_block * block_size, SEEK_SET);
fputs(write_value, fp);
fsync(fp);
fclose(fp);

Questions:

Obviously, I’m trying to understand how it should be implemented. So any suggestions are welcome. Specially:

  1. If writing to one block of a large file fails, what is the chance of corrupting other blocks of data?
  2. In short, What things should be considered on perfecting code above, (according to the last question)?
  3. Is it possible to replace one block of data with another file/block atomically? (like how rename() system call replaces one file with another atomically, but in block-level. Something like replacing next-block-address of previous block in file system or whatever else).
  4. Any device/file system/operating system specific notes? (This code will run on CentOS/FreeBSD (not decided yet), but I can change the OS if there is better alternative for this problem. File is on one 8TB SSD).

2

Answers


  1. If writing to one block of a large file fails, what is the chance of corrupting other blocks of data?

    None.

    Is it possible to replace one block of data with another file/block atomically? (like how rename() system call replaces one file with another atomically, but in block-level. Something like replacing next-block-address of previous block in file system or whatever else).

    No.

    Login or Signup to reply.
  2. Threads should not block other blocks (no lock on file, or minimum possible number of locks should be used)

    Your code sample uses fseek followed by fwrite. Without locking in-between those two, you have a race condition because another thread could jump in-between. There are three reasonable solutions:

    1. Use flockfile, followed by regular fseek and fwrite_unlocked then funlock. Those are POSIX-2001 standard
    2. Use separate file handles per thread
    3. Use pread and pwrite to do IO without having to worry about the seek position

    Option 3 is the best for you.

    You could also use the asynchronous IO from <aio.h> to handle the multithreading. It basically works with a thread-pool calling pwrite on most Unix implementations.

    In case of (any kind of) failure, There is no problem if updating block corrupts

    I understand this to mean that there should be no file corruption in any failure state. To the best of my knowledge, that is not possible when you overwrite data. When the system fails in the middle of a write command, there is no way to guarantee how many bytes were written, at least not in a file-system agnostic version.

    What you can do instead is similar to a database transaction: You write the new content to a new location in the file. Then you do an fsync to ensure it is on disk. Then you overwrite a header to point to the new location. If you crash before the header is written, your crash recovery will see the old content. If the header gets written, you see the new content. However, I’m not an expert in this field. That final header update is a bit of a hand-wave.

    In case of (any kind of) failure, blocks that are not updating should not corrupts.

    Should be fine

    If file write was successful, we must be sure that it is not buffered and be sure that actually written on disk (fsync)

    Your sample code called fsync, but forgot fflush before that. Or you set the file buffer to unbuffered using setvbuf

    I can convert this large file to as many smaller files as needed (down to 4kb files), but I prefer not to do that. Handling that many files is difficult, and needs a lot of file handles open/close operations, which has negative impact on performance.

    Many calls to fsync will kill your performance anyway. Short of reimplementing database transactions, this seems to be your best bet to achieve maximum crash recovery. The pattern is well documented and understood:

    1. Create a new temporary file on the same file system as the data you want to overwrite
    2. Read-Copy-Update the old content to the new temporary file
    3. Call fsync
    4. Rename the new file to the old file

    The renaming on a single file system is atomic. Therefore this procedure will ensure after a crash, you either get the old data or the new one.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search