I am looking to implement the RandomX Proof of Work hash function in JavaScript + WebAssembly. The README explains how this isn’t possible:
Web mining is infeasible due to the large memory requirement and the lack of directed rounding support for floating point operations in both Javascript and WebAssembly.
Due to the pervasive use of repeatedly switching the rounding modes using a CFROUND
instruction inside the bytecode and the fact that JavaScript has no ability to change the rounding mode or even use any other rounding mode other than ties to even, its impossible.
The CFROUND
instruction manipulates the fprc
abstract register inside the RandomX spec. fprc
starts at zero, so round ties to even and is updated seemingly randomly on every instruction execution.
fprc |
rounding mode |
---|---|
0 | roundTiesToEven |
1 | roundTowardNegative |
2 | roundTowardPositive |
3 | roundTowardZero |
RandomX uses 5 properly rounded operations, add, sub, mul, div, and sqrt guaranteed to be properly rounded by the IEEE754 spec in double precision. My question is this. Soft float is not an option, I have access to floating point arithmetic only in round ties to even for those 5 operations. I need to find a way to emulate double precision arithmetic with all of the other rounding modes implemented in terms of one rounding mode for those 5 operations. This would be much faster than soft float.
Bruce at the Random ASCII blog had something to say:
If I understand correctly then the other three rounding modes will produce results that are either identical to round-to-nearest-even or one ULP away. If the round-to-nearest-even operation is exact then all rounding modes will give the same result. If it is not exact then you’d have to figure out which results would be different. I think that either round-towards-negative or round-towards-positive would be one ULP away, but not both.
FMA (fused multiply add) operations aren’t available in JavaScript, however they are available in WebAssembly but gated behind non deterministic operations (which aren’t widely supported). Those operations are f32x4.relaxed_*madd
and f64x2.relaxed_*madd
. If the use of FMA is required, this is fine, I can implement fallbacks.
The only paper that was even slightly close to floating point rounding mode emulation was Emulating round-to-nearest ties-to-zero ”augmented” floating-point operations using round-to-nearest ties-to-even arithmetic. It makes some interesting points but not useful.
Additional information 0: RandomX does operations using two classes of abstract floating point registers, group F (additive operations) and group E (multiplicative operations). According to the specification the absolute value of group F will never exceed 3.0e+14
and the approximate range of group E is 1.7E-77
to infinity
(always positive). All results of floating point operations can NEVER be NaN. Answers may take this into account.
Additional information 1 (address @sech1 comment): The FSCAL_R
instruction complicates things. Executing the instruction efficiently requires manipulating the actual floating point binary format:
The mathematical operation described above is equivalent to a bitwise XOR of the binary representation with the value of
0x80F0000000000000
.
Which means alternative forms of representation (double-double, fixed point) are out of the picture:
Because the limited range of group F registers would allow the use of a more efficient fixed-point representation (with 80-bit numbers), the FSCAL instruction manipulates the binary representation of the floating point format to make this optimization more difficult.
There is a possibility that double-double arithmetic might work, if the FSCAL_R
instruction could be splayed among both doubles.
This is a question that I am not qualified to answer, but I believe its possible. How could I?
This is as far as I tried to emulate round toward zero in terms of round to nearest, ties to even.
double add_fe_towardzero(double a, double b) {
if (a + b >= 0) {
return (a + b + 0.5 * ulp(a + b)) - 0.5 * ulp(a + b);
} else {
return (a + b - 0.5 * ulp(a + b)) + 0.5 * ulp(a + b);
}
}
I have a test harness iterating over a random selection of floating point values and computing the outputs from the reference and the implementation and comparing them similar to the post There are Only Four Billion Floats–So Test Them All! The above function has a 1/4 success rate, not 100%.
3
Answers
You can’t get away without soft float entirely, but the only soft float operation you need is FMA:
https://github.com/SChernykh/RandomX_OpenCL/blob/master/RandomX_OpenCL/CL/randomx_vm.cl#L1406
check "fma_soft", "div_rnd", "sqrt_rnd" functions.
Edit: but I think the better approach would be to use 2 floating point variables to double the precision: https://en.wikipedia.org/wiki/Quadruple-precision_floating-point_format#Double-double_arithmetic
https://www.youtube.com/watch?v=6OuqnaHHUG8
This way you’ll have enough bits to determine which way to round in each rounding mode. But prepare to write a lot of "if" for all the edge cases – the "fma_soft" code above is monstrous for a reason.
I think double-word arithmetic may be a viable option.
Let
a
andb
be positive, non-zero floating-point numbers. Consider the Error-Free Transforma+b -> x,y
wherex = RN(a+b)
andy = (a+b)-x
. (x
andy
can be computed, e.g., by the well-known 2Sum algorithm. 1) Assume that no under-/overflows occur. Then ify = 0
, the sum is exact and all rounding modes give the same result fora+b
. Ify > 0
, then rounding up, away from zero or toward +infinity givesx+ulp(x)
. Conversely, ify<0
, rounding down, toward zero or -infinity givesx-ulp(x)
.I think this specific example can be generalized to other operations (e.g., multiplication) and the full range of floating-point values. As has been said, there are many cases to consider. The edge cases will undoubtedly be particularly tedious.
A good reference: Jean-Michel Muller, Laurence Rideau. Formalization of double-word arithmetic, and comments on ”Tight and rigorous error bounds for basic building blocks of double-word arithmetic”. ACM Transactions on Mathematical Software, Association for Computing Machinery, 2022, 48 (1), pp.1-24. 10.1145/3484514. hal-02972245v2. Link
Without FMA, you could use the two-product algorithm
Here is a sketch of solution using two-sum and two-product so as to find the residual rounding error.
This solution requires using some implementation of standard nextafter() function, which can be found from various places like https://github.com/scijs/nextafter/blob/master/README.md
Excuse my javascript skills which might be very approximative, but you’ll get the idea.