Platform characteristics: Intel(R) Core(TM) i5-8265U CPU @ 1.60GHz 1.80 GHz, 8GB RAM, Windows 10, Visual Studio, MSVC compiler
I wrote the following code in C++ and then I run this executable with Micosoft .NET API (class System.Diagnostics.Process) and print statistics about the process through some time intervals.
#include <iostream>
#include <chrono>
#include <thread>
int main()
{
long long n = 1'000'000'000;
std::cout << "part 1 started" << std::endl;
char* arr = static_cast<char*>(malloc(n));
std::this_thread::sleep_for(std::chrono::milliseconds(500));
arr[n - 1] = rand();
std::this_thread::sleep_for(std::chrono::milliseconds(500));
std::cout << "part2 started" << std::endl;
for (long long i = n / 5.0 * 4.0; i < n; i++)
arr[i] = rand();
std::this_thread::sleep_for(std::chrono::milliseconds(500));
std::cout << "part3 started" << std::endl;
for (long long i = n/5; i < n; i++)
arr[i] = rand();
std::this_thread::sleep_for(std::chrono::milliseconds(500));
long long tmpll = 0;
for (int i = 0; i < 100000000; i++)
{
tmpll = 0;
if (rand() % 2)
tmpll = 2'400'000'000;
if (rand() % 2)
tmpll += 2'400'000'000;
if (rand() % 2)
tmpll = 2'400'000'000;
tmpll += rand();
tmpll += rand();
if (tmpll < 0) tmpll = abs(tmpll);
tmpll %= n;
arr[tmpll] = rand();
}
std::cout << arr[rand() % n] << std::endl;
std::this_thread::sleep_for(std::chrono::milliseconds(2000));
std::cout << "ended" << std::endl;
free(arr);
std::cout << "freed" << std::endl;
}
I printed PrivateMemorySize64, WorkingSet64 and PeakWorkingSet64 fields, and got the amount I asked to allocate in PrivateMemorySize64 but always a smaller value (usually no more than 80% of allocated space) in PeakWorkingSet64. Okay, I thought, I read that Windows automatically loads rarely used pages on disk, so it is ok. But just now I’ve disabled swap file usage (now malloc fails for 2GB even while it worked correctly with swap file for 8GB on the system with 3GB free RAM, and the code worked correctly as well using 3GB PeakWorkingSet) and still got this result:
PrivateMemorySize64 = 1002602496 WorkingSet64 = 803287040 PeakWorkingSet64 = 803291136
How is that possible? I cannot really imagine an optimization that allows not to really store 1GB memory as I write into random places of the array for a long time. I thought even when I was not accessing the whole array and just printed a random element it had to store all 1GB. Please tell me what does Windows 10 really do here.
I reloaded the computer after disabling page file usage. Then I turned swap file usage back and got the same result:
'PrivateMemorySize64 = 1002516480
WorkingSet64 = 803246080
PeakWorkingSet64 = 803299328'
2
Answers
The discrepancy between allocated memory (PrivateMemorySize64) and the working set (WorkingSet64) is due to Windows’ memory management. When you allocate memory using malloc(), Windows reserves virtual memory, but physical memory (RAM) is only allocated when the memory is accessed. The working set represents the memory portion actively in use and kept in RAM, while less-used pages may be trimmed or not allocated until needed. This behavior is as expected and reflects the efficient optimization of memory, even while the page file is disabled.
If you want to understand the numbers, read Mark Russinovich’s book Windows Internals. Part 1 describes processes and memory management.
You need to distinguish between virtual memory, which is what your process has access to, and physical memory, which is the RAM of your PC.
Some of the virtual memory must also be in RAM, because the CPU needs to work with it. This is called working set. The rest of the virtual memory is either swapped to disk, or may even be non-existent yet (reserved only).
Memory can be allocated (committed or reserved) in a certain granularity (typically 64k) and swapped to disk in sizes of a page (typically 4k).
For your C++ code, it matters
So, from the code only, it’s not possible to tell how much memory it will use.
As for the 2 GB allocation of a single block, this may fail due to memory fragmentation. E.g. a 32 bit application may access 4 GB of virtual memory at may. Now consider a DLL being loaded at exactly the 2 GB boundary. This means that there’s less than 2 GB below the DLL and less than 2 GB above the DLL. Thus, you cannot allocate a contiguous block of 2 GB.