It seems the GCC and Clang interpret addition between a signed and unsigned integers differently, depending on their size. Why is this, and is the conversion consistent on all compilers and platforms?
Take this example:
#include <cstdint>
#include <iostream>
int main()
{
std::cout <<"16 bit uint 2 - int 3 = "<<uint16_t(2)+int16_t(-3)<<std::endl;
std::cout <<"32 bit uint 2 - int 3 = "<<uint32_t(2)+int32_t(-3)<<std::endl;
return 0;
}
Result:
$ ./out.exe
16 bit uint 2 - int 3 = -1
32 bit uint 2 - int 3 = 4294967295
In both cases we got -1, but one was interpreted as an unsigned integer and underflowed. I would have expected both to be converted in the same way.
So again, why do the compilers convert these so differently, and is this guaranteed to be consistent? I tested this with g++ 11.1.0, clang 12.0. and g++ 11.2.0 on Arch Linux and Debian, getting the same result.
4
Answers
A 16-bit unsigned int can be promoted to a 32-bit int without any lost values due to range differences, so that’s what happens. Not so for the 32-bit integers.
When you do
uint16_t(2)+int16_t(-3)
, both operands are types that are smaller thanint
. Because of this, each operand is promoted to anint
andsigned + signed
results in a signed integer and you get the result of-1
stored in that signed integer.When you do
uint32_t(2)+int32_t(-3)
, since both operands are the size of anint
or larger, no promotion happens and now you are in a case where you haveunsigned + signed
which results in a conversion of the signed integer into an unsigned integer, and theunsigned
value of-1
wraps to being the largest value representable.Standard quotes for [language-lawyer]:
std::uint16_t
type may have a lower conversion rank thanint
in which case it will be promoted when used as an operand.int
may be able to represent all values ofstd::uint16_t
in which case the promotion will be toint
. The common type of twoint
isint
.std::uint32_t
type may have the same or a higher conversion rank thanint
in which case it won’t be promoted. The common type of an unsigned type and a signed of same rank is an unsigned type.For an explanation why this conversion behaviour was chosen, see chapter "6.3.1.1 Booleans, characters, and integers" of "
Rationale for
International Standard—
Programming Languages—
C". I won’t quote the entire chapter here.
The consistency depends on relative sizes of the integer types which are implementation defined.
C (and hence C++) has a rule that effectively says when a type smaller than int is used in an expression it is first promoted to int (the actual rule is a little more complex than that to allow for multiple distinct types of the same size).
Section 6.3.1.1 of the Rationale for International Standard Programming Languages C claims that in early C compilers there were two versions of the promotion rule. "unsigned preserving" and "value preserving" and talks about why they chose the "value preserving" option. To summarise they believed it would produce correct results in a greater proportion of situations*.
It does not however explain why the concept of promotion exists in the first place. I would speculate that it existed because on many processors, including the PDP-11 for which C was originally designed, arithmetic operations only operated on words, not on units smaller than words. So it was simpler and more efficient to convert everything smaller than a word to a word at the start of an expression.
On most platforms today int is 32 bits. So both uint16_t and int16_t are promoted to int. The artithmetic proceeds to produce a result of type int with a value of -1.
OTOH uint32_t and int32_t are not smaller than int, so they retain their original size and signedness through the promotion step. The rules for when the operands to an arithmetic operator are of different types come into play and since the operands are the same size the signed operand is converted to unsigned.
The rationale does not seem to talk about this rule, which suggests it goes back to pre-standard C.
On an Ansi C or ISO C++ platform it depends on the size of int. With 16 bit int both examples would give large positive values. With 64-bit int both examples would give -1.
On pre-standard implementations it’s possible that both expressions might return large positive numbers.
* This belief is somewhat shattered by modern C compilers that treat integer overflow as an optimisation opportunity.