skip to Main Content

I am trying to beat pow function of C++ by using Taylor Series, Newton’s Method, continued fraction and such, the goal is to get a good approximation faster than pow.

I wrote some functions to benchmark against pow, I tried to use std::chrono::steady_clock to measure the execution time. My idea is to run the function 1048576 times and divide the duration by 1048576 to get single pass time:

#include <chrono>
#include <cmath>
#include <iostream>
#include <string>

using std::chrono::steady_clock;
using std::chrono::duration;
using std::cout;

inline float power(float base, int exp) {
    float m = 1.0;
    for (int i = 0; i < exp; i++) {
        m *= base;
    }
    return m;
}

float newton(float base, int exp, int lim) {
    float i = 1.0 / exp;
    float r = 1 + (base - 1) / exp;
    exp--;
    for (int c = 0; c < lim; c++) {
        r = i * (exp * r + base / power(r, exp));
    }
    return r;
}

int main()
{
    auto start = steady_clock::now();
    float r;
    for (int64_t i = 0; i < 1048576; i++) {
        r = newton(256.0, 3, 16);
    }
    auto end = steady_clock::now();
    duration<double, std::nano> time = end - start;
    cout << "newton(256, 3, 16): " << time.count() / 1048576 << " nanosecondsn";
}

But the resultant measured values don’t make sense at all:

PS C:UsersXeni> C:UsersXenisourcereposexponentiationx64Releaseexponentiation.exe
newton(256, 3, 16): 0 nanoseconds
PS C:UsersXeni> C:UsersXenisourcereposexponentiationx64Releaseexponentiation.exe
newton(256, 3, 16): 9.53674e-05 nanoseconds
PS C:UsersXeni> C:UsersXenisourcereposexponentiationx64Releaseexponentiation.exe
newton(256, 3, 16): 9.53674e-05 nanoseconds
PS C:UsersXeni> C:UsersXenisourcereposexponentiationx64Releaseexponentiation.exe
newton(256, 3, 16): 9.53674e-05 nanoseconds

My CPU runs at 3GHz, that is 3 billion clock cycles per second, so a clock cycle is about 3.33333333333333E-10 seconds, if the result is to be believed, then the computation happened in about 1e-4 nanoseconds, which is 1e-13 seconds, less than a clock cycle.

I compiled with Visual Studio 2022, compiler arguments:

/permissive- /ifcOutput "x64Release" /GS /GL /W3 /Gy /Zc:wchar_t /Zi /Gm- /O2 /Ob1 /sdl /Fd"x64Releasevc143.pdb" /Zc:inline /fp:fast /D "NDEBUG" /D "_CONSOLE" /D "_UNICODE" /D "UNICODE" /errorReport:prompt /WX- /Zc:forScope /std:c17 /Gd /Oi /MD /std:c++20 /FC /Fa"x64Release" /EHsc /nologo /Fo"x64Release" /Ot /Fp"x64Releaseexponentiation.pch" /diagnostics:column

I measured the execution of the whole program in PowerShell, and calculated the single pass to take about 8 nanoseconds, if PowerShell is correct:

PS C:UsersXeni> measure-command {C:UsersXenisourcereposexponentiationx64Releaseexponentiation.exe}


Days              : 0
Hours             : 0
Minutes           : 0
Seconds           : 0
Milliseconds      : 8
Ticks             : 89682
TotalDays         : 1.03798611111111E-07
TotalHours        : 2.49116666666667E-06
TotalMinutes      : 0.00014947
TotalSeconds      : 0.0089682
TotalMilliseconds : 8.9682



PS C:UsersXeni> 8.9682/1048576*1e6
8.55274200439453

What is wrong with my code? How can I fix it?

3

Answers


  1. The compiler sees the newton function not actually changing program state and the return value is getting ignored. And the same parameters are used each time.

    An easy way to force the compiler to keep newton optimized, but not optimized out completely is to do something meaningful with the result (like print it):

    int main()
    {
        int randos[10000];
        for (int j = 0; j < 10000; j++) {
            randos[j] = rand() % 40;
        }
    
        auto start = steady_clock::now();
        float r = 0;
        for (int64_t i = 0; i < 1048576; i++) {
            r += newton(256.0+randos[i%10000], 3, 16);
        }
        auto end = steady_clock::now();
        duration<double, std::nano> time = end - start;
        cout << "newton(256, 3, 16): " << time.count() / 1048576 << " nanosecondsn";
        cout << "r for the sake of r: " << r << std::endl;
    }
    
    Login or Signup to reply.
  2. You can use volatile variables to stop the compiler from optimizing away important things. You have to make sure that the the computation actually needs to be performed and that the result is actually used:

    // external linkage
    volatile float BASE = 256.0;
    volatile float r = 0.0f;
    
    int main()
    {
        auto start = steady_clock::now();
        for (int64_t i = 0; i < 1048576; i++) {
            // BASE could be anything, because it's extern + volatile
            // and r must be stored, because it's externally visible
            r = newton(BASE, 3, 16);
        }
        auto end = steady_clock::now();
        duration<double, std::nano> time = end - start;
        cout << "newton(256, 3, 16): " << time.count() / 1048576 << " nanosecondsn";
    }
    
    Login or Signup to reply.
  3. Do NOT use volatile to disable compiler optimization. It will make it disable every normal optimization that would be done with that variable and throw your numbers all the way off – in the other direction. You won’t get a realistic number.

    There is a video from Chandler Karruth (2015) that explains that what you need to do is exactly create a false, empty usage.

    CppCon 2015: Chandler Carruth "Tuning C++: Benchmarks, and CPUs, and Compilers! Oh My!"

    Taken from google benchmark:

    template <class Tp>
    inline inline void DoNotOptimize(Tp const& value) {
        asm volatile("" : : "g"(value) : "memory");
    }
    

    The above is good for gnu-like compilers like GCC and clang.

    Use as

    int main()
    {
        int randos[10000];
        for (int j = 0; j < 10000; j++) {
            randos[j] = rand() % 40;
        }
    
        auto start = steady_clock::now();
        float r = 0;
        for (int64_t i = 0; i < 1048576; i++) {
            r += newton(256.0+randos[i%10000], 3, 16);
        }
        DoNotOptimize(r);
    
        auto end = steady_clock::now();
        duration<double, std::nano> time = end - start;
        cout << "newton(256, 3, 16): " << time.count() / 1048576 << " nanosecondsn";
    }
    

    Also your function newton is being optimized away. You need to make it a call. The proper way to do that is to place the function body of newton inside a different translation unit (cpp file) and then link against it in your main.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search