I’m using clang++ 10 on Ubuntu 20.04 LTS, with -fsanitize-undefined-trap-on-error -fsanitize=address,undefined,nullability,implicit-integer-truncation,implicit-integer-arithmetic-value-change,implicit-conversion,integer
My code is generating random bytes with
std::random_device rd;
std::mt19937 gen(rd());
std::uniform_int_distribution<uint8_t> dd(0, 255);
...
ch = uint8_t(dd(gen));
This last line causes the sanitizer to report undefined behavior is in bits/random.tcc
template<...> void mersenne_twister_engine<...>::
_M_gen_rand(void) {
const _UIntType __upper_mask = (~_UIntType()) << __r;
const _UIntType __lower_mask = ~__upper_mask;
for (size_t __k = 0; __k < (__n - __m); ++__k)
{
_UIntType __y = ((_M_x[__k] & __upper_mask)
| (_M_x[__k + 1] & __lower_mask));
_M_x[__k] = (_M_x[__k + __m] ^ (__y >> 1)
^ ((__y & 0x01) ? __a : 0));
}
for (size_t __k = (__n - __m); __k < (__n - 1); ++__k)
{
_UIntType __y = ((_M_x[__k] & __upper_mask)
| (_M_x[__k + 1] & __lower_mask));
_M_x[__k] = (_M_x[__k + (__m - __n)] ^ (__y >> 1) <<<<===== this line
^ ((__y & 0x01) ? __a : 0));
}
_UIntType __y = ((_M_x[__n - 1] & __upper_mask)
| (_M_x[0] & __lower_mask));
_M_x[__n - 1] = (_M_x[__m - 1] ^ (__y >> 1)
^ ((__y & 0x01) ? __a : 0));
_M_p = 0;
}
The error reads:
/usr/include/c++/10/bits/random.tcc:413:33: runtime error: unsigned integer overflow: 397 - 624 cannot be represented in type 'unsigned long'
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /usr/include/c++/10/bits/random.tcc:413:33 in
/usr/include/c++/10/bits/random.tcc:413:26: runtime error: unsigned integer overflow: 227 + 18446744073709551389 cannot be represented in type 'unsigned long'
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /usr/include/c++/10/bits/random.tcc:413:26 in
It appears that there is a difference __m-__n == 397 - 624
that is obviously negative but the operands are all unsigned.
The variables being subtracted are template parameters defined as size_t __n, size_t __m
so this is not a random edge case, it’s the actual template being implemented.
Is this a bug in this implementation of the STL or my usage is wrong?
A minimal reproducible example: https://godbolt.org/z/vvjWscPnj
UPDATE: Issue (not a bug) filed to GCC https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106469 – closed as "WONT FIX"
GCC’s team called clang’s ubsan unsigned integer overflow checks bad practice, because the behaviour is well-defined (as modulo wrapping) in ISO C++. Although modulus arithmetic is used in PRNG, this is not the case in this particular case.
However in most userspace code a unsigned overflow is almost all the time a bug to be caught and this non-bug on the GCC’s STL prevents users from benefitting from this useful check.
3
Answers
The result of using
uint8_t
in astd::uniform_int_distribution
is undefined, so:You can use any of
short
,int
,long
,long long
,unsigned short
,unsigned int
,unsigned long
, orunsigned long long
instead.Quote from
rand.req.gen
/1.5If that doesn’t help, skip the
-fsanitize=integer
option since… and unsigned integer overflow does not have undefined behavior. The check for signed integer overflow will be automatically enabled by using
-fsanitize=undefined
so you don’t have to enable that separately.If that still doesn’t help, it could be a bug in the gcc library implementation used by
clang++
that causes this. You can try usingclang++
‘s library implementation to see if that helps:Although as the other answer indicates it is undefined behavior per standard to instantiate
std::uniform_int_distribution
withuint8_t
template argument, the UBsan warning here is unrelated to that.UBSan is flagging the implementation of the Mersenne twister itself, but the implementation doesn’t have any undefined behavior or bug.
If you look closely you see that the offending expression is
where
__k
is a value in the range from(__n - __m)
to(__n - 1)
via thefor
loop.All of the types involved in these operations are
std::size_t
which is unsigned. As a consequence these operations all use modular arithmetic and therefore even if__m - __n
is negative and not representable in the unsigned type, the result ofwill lie between
0
and__m - 1
, so that indexing the array with it is not a problem. No undefined behavior, unspecified behavior or implementation-defined behavior is involved.The UBSan check which is flagging this is not flagging actual undefined behavior. It is perfectly ok to rely on the wrap-around behavior of unsigned arithmetic like this if one is aware of it. The unsigned overflow check is only meant to be used to flag instances of such wrap-around where it was not intentional. You shouldn’t use it on other’s code that might rely on it or on your own code if you might be relying on it.
In
-fsanitize=address,undefined,nullability,implicit-integer-truncation,implicit-integer-arithmetic-value-change,implicit-conversion,integer
all exceptaddress
andundefined
enable UBsan checks which are not flagging actual undefined behavior, but conditions that may be unintentional in many cases. The default-fsanitize=undefined
sanitizer flag doesn’t enable the unsigned integer overflow check by default for the reasons given above. See https://clang.llvm.org/docs/UndefinedBehaviorSanitizer.html for details.unsigned
types have well-defined wrapping behaviour in C++. This is one reason why they get used in PRNGs and other bit-manipulation use-cases where that’s desired and expected (and necessary for the algorithm), not an error.GCC devs are right: it’s unreasonable to treat all unsigned wrapping as a problem. It’s even more unreasonable to print out that it’s "undefined behaviour", rather than a possible problem. If clang’s ubsan had told you in the first place that it was well-defined in C++ and perhaps intended, you wouldn’t have had to bother the GCC devs with a bug report that wasn’t useful to them. Or you could have phrased it as a feature request after understanding the issue.
But you’re also right: with library functions in headers where they become part of your own code, that makes it very hard to disentangle library code (such as this PRNG) from your own code, when it inlines into the same compilation unit. And ubsan options are per-file.
libc++’s implementation of mt19937 disables that ubsan check where necessary. It’s a more recent clean-slate implementation of the C++ standard library developed as part of LLVM and mostly used with clang. If any header-library was going to cater to this sanitizer that flags some valid C++ operations as problems, it would be libc++. https://godbolt.org/z/aeY5Yn9c6 shows that adding
-stdlib=libc++
to the compile options on Godbolt lets your test case run cleanly. You’d have to get it installed locally to actually use it.libc++ defines a preprocessor macro
_LIBCPP_DISABLE_UBSAN_UNSIGNED_INTEGER_CHECK
as__attribute__((__no_sanitize__("unsigned-integer-overflow")))
(if supported), so it can disable that on a per-function basis. See for example libcxx’s<utility>
header where various functions use that tag, andmersenne_twister_engine<...>::seed()
in<random>
. But interestingly, it doesn’t use it everywhere, so you can still get the benefit of overflow checking.Or you could write a wrapper function around random number generation and put that in a separate
.cpp
you compile withoutsanitize=integer
. In a release build with-flto
, it can still get inlined. Or if you don’t need as high quality randomness, use libcrandom(3)
; it is separately compiled, not an inline header. Linux’srandom()
is non-terrible, although not great either. Other PRNGs like xorshift / xoroshiro are fast and good, but would also useunsigned
types and rely on their wrapping for multiply and/or add/sub, unless they only use shifts and xor like an LFSR.There’s no way to mark only some unsigned operations as wrapping-expected in ISO C++.
At least one language, Rust, does solve this problem: overflow of the value-range is always an error for plain
+
,-
,*
,/
, etc. for any integral type including signed and unsigned. You can use x.wrapping_sub(y) to do signed or unsigned subtraction with well-defined wrap-around. Similarly for add/mul/div/rem/shift/pow. And there’s saturating_sub/add/etc, and overflowing_… that returns the wrapped result and a boolean, or checked_add/sub/etc that returns a type that can be None instead of holding an integer. So if you want to fuss over integer overflows, Rust might be the language for you.(I wouldn’t be surprised if LLVM’s back-end checking for unsigned overflow was partially motivated by Rust, and someone thought it might sometimes be useful to expose that for use by C++. But expect false positives in code not written with that checker in mind.)
GNU C integer wrapping overflow extensions
GCC/Clang and other compilers that understand the GNU dialect of C and C++ have integer-overflow builtins. That includes both
signed
andunsigned
wrapping add/sub/mul. But only for (unsigned)int
/long
/long long
; you’d have to figure out which one to use forsize_t
in libstdc++. (e.g. on Windows x64,size_t
has to belong long
, but on x86-64 System V it’slong
)A test case on Godbolt shows that
__builtin_usubl_overflow
does safely do a wrapping subtract of1UL, 2UL
. (Making asm that doesn’t even try to detect wrapping, because we’ve told the compiler that’s not an error on this one operation.) Uncommentingreturn x-y;
does trap the overflow.It would be very cumbersome to use this for every unsigned operation in standard library code where wrapping isn’t an error, which is why libc++ disables the unsigned-wrapping sanitizer on a per-function basis instead.
Since unsigned math is well-defined as wrapping, the normal reason for using the unsigned versions of these GNU C builtins is to capture the carry/borrow output, so you know if they wrapped. Instead of using clang’s
sanitize=integer
, you could use these functions on your ownunsigned
operations, andassert()
that the bool result is false (no wrapping overflow).