I recently upgraded my OS from Debian 9 to Debian 11. I have a bunch of servers running a simulation and one subset produces a certain result and another subset produces a different result. This did not used to happen with Debian 9. I have produced a minimal failing example:
#include <stdio.h>
#include <math.h>
int main()
{
double lp = 11.525775909423828;
double ap = exp(lp);
printf("%.14f %.14fn", lp, ap);
return 0;
}
The lp value prints the same answer on every machine but I have two different answers for ap: 101293.33662281210127 and 101293.33662281208672
The code was compiled with "gcc fptest.c -lm -O0". The ‘-O0’ was just added to ensure optimizations weren’t an issue. It behaves the same without this option.
The libraries linked in the Debian 11 version are libm-2.31.so and libc-2.31.so.
The libraries linked in the (working) Debian 9 version are libm-2.24.so and libc-2.24.so.
The servers are all running with different CPUs so its hard to say much about that. But I get different results between a xeon E5-2695 v2 and a xeon E5-2695 v3 for example.
Amongst all the processors I have, I only see one of these two results on Debian 11, and when running on Debian 9 I consistently only get one result.
This feels like a bug in libm-2.31 and/or libc-2.31 to me. I have zero experience with this sort of thing. Could someone please explain if what I am seeing is expected? Does it look like a bug? Anything I can do about it? etc.
Also tried compiling with clang, and get the exact same problem.
Also note that the binary compiled on Debian 9 runs on Debian 11 and produces the same results/problem as the Debian 11 binary adding further weight to my suspicion that this is library related (I cannot run the Debian 11 binary on Debian 9).
Update
Just read this post which was helpful. So I’m happy that different architectures may give different results for the exp() function. But all my processors are x86_64 and some kind of intel xeon-xxxx. I can’t understand why the exact same binary with the exact same libraries is giving different results on different processors.
As suggested in that post I printed the values using %a. The two answers differ only by the LSB.
If I use expl() I get the same answer on all machines.
An explanation of why I’m seeing differences, and if this is expected, would be nice. Any compiler flags that ensure consistency would also be nice.
4
Answers
Finally figured out what's going on.
Before my code even begins, libc_start_main does a call to ieee754_exp_ifunc. This function appears to select which function does the exp(). On one set of machines it selects ieee754_exp_avx and on the other set it selects ieee754_exp_fma.
The ieee754_exp_ifunc has a comment about dl_x86_cpu_features so it seems that the exponential function implementation is being chosen based on the processor capabilities. I didn't dig any further than that.
In Debian 11, on machines which have both avx and fma capability, fma is chosen. For the subset of my machines that don't have fma it obviously uses avx. However in Debian 9 it used avx even on processors which have fma.
Finally, I tried compiling with gcc using '-mavx'. It gave that a good ignoring and still used fma on the machines that have it. I wonder if/how I can force avx? I guess that's a question for another post.
It’s not a bug. Floating point arithmetic has rounding errors. For single arithmetic operations + – * / sqrt the results should be the same, but for floating-point functions you can’t really expect it.
In this case it seems the compiler itself produced the results at compile time. The processor you use is unlikely to make a difference. And we don’t know whether the new version is more or less precise than the old one.
Background
Some notes to clarify the results of e11.5257759094238_28.
11.5257759094238_28 is not exactly representable as a
double
, instead a nearby value of 0x1.70d328p+3 is used or exactly 11.5257759094238_28125, so we really are investigating e11.5257759094238_28125 and how wellexp()
performs.We have 3 results, 2 from various compilers
exp()
and a 3rd mathematical one. The 2 C code answers are 1 ULP apart and the math one is between them,
The math answer is very nearly half way between the two representable
double
s with the larger of OP’s answers being 1/1000 ULP closer.Candidate contributors to a difference
User code and math libraries are not as exactly the same as OP supposes. (IMO – most likely)
Code uses the processors’ built in
e(x)
hardware primitives instruction and those render different answers. (possible)Inherited rounding modes are not the same (doubtful)
Getting consistent answers – to the last bit – is quite challenging in this case. A common issue is that newer processor and support libraries tend to render better answers than before. This is part of the Table-maker’s_dilemma and test code should be tolerant of these minute (<= 1.0 ULP) differences in transcendental functions like
exp()
.Note that using
long double
may not help as some implementations havelong double
encoding exactly asdouble
.With a few exceptions, glibc does not aim for correctly rounded math functions.
This means that results can vary slightly between different glibc versions and different processors. Processor differences can result from different implementations based on processor capabilities, for example depending on whether the processor implements FMA instructions.