Type independent numeric comparison

Avoiding problems with matching signed and unsigned types in C++03

Craig Henderson

Published online 2009

Introduction

How difficult can it be to compare two numeric values and guarantee that the result is accurate? Surprisingly difficult, in fact, in a complex software system where variables exists in a variety of data types, their comparisons are not necessarily what you'd expect. The issue is with the data representation of a particular value. Comparing the value of two variables of different types can result in unexpected behaviour if one data type represents a signed numeric, and the other represents an unsigned one.

Motivating example

I have an image processing library that contains a data type representation of a bitmapped image. This data type has methods to return the dimensions of the image - width and height - as unsigned long types. In using this library within a Microsoft Windows application, I needed to compare the width and height of the image to the width and height of the visible window area. The Windows API GetClientRect() fills in a RECT structure with the dimensions of a window, but these dimensions are stored in long (implicitly signed) types. I now have to compare the unsigned width of the image with the signed width of a window, and this rightly prompts a warning from the compiler about mismatch signs. In this simple example, a static_cast on the window dimension to an unsigned long will succeed given that a window can't be a negative width or height. Now, I would argue that this is a flaw in the design of the Windows APIs, but that design is here to stay, so I need to work with it. But there is a more significant problem here. If for any reason, a window dimension does become negative, perhaps because of some manipulation of the dimensions after the API call, I am in trouble. The comparison of a signed and an unsigned value will result in promoting the signed to unsigned before the test. A negative number, when treated as unsigned, becomes very large, and it is therefore unlikely that the test will yield the desired results.

For example, consider the following:

if (-1 < 20)
    std::cout << "-1 is less than 20";

This will work as expected and output the string because the default integer representation is signed. However,

if (-1 < 20U)
    std::cout << "-1 is less than 20";

will not work as expected. 20U is an unsigned data representation of the number 20. When the comparison is performed, -1 is promoted to an unsigned integer before the test, and the test is then 0xffffffff < 20 on a 32-bit system, and the string will not be output.

This is often surprising to programmers and can result in some very subtle runtime errors that are hard to diagnose. There are therefore times in a software system where the value of data types is to be compared, regardless of the data type representation. Why shouldn't a programmer be able to compare the numeric value of two data type without concern for subtle sign or type promotion issues? In this article, I'll design and develop a mechanism for doing just that, using compile time meta-programming.

Requirements

The goal is to provide a very simple to use function that will return a signed integer value to indicate the numerical relationship between two values, regardless of their type or sign capabilities.

The function return value will be:

Algorithm

if exactly one type is signed AND the it's value is < 0
    return -1
else
    promote both values to largest signed/unsigned type
    compare and return -1, 0 or 1 as appropriate
endif

Function signature

The numcmp function has the same semantics as the runtime library strcmp function, and will therefore be familiar to developers and easy to use.

template<typename T1, typename T2>
int numcmp(T1 t1, T2 t2);

Prerequisites

We need a bit of meta scaffolding before we can delve into the real issues. First of all, we need a compile-time version of std::max to select the larger of two given constants. This is easily achieved with a struct templated on the two constant values, so we can write expressions such as std::static_max<5,6>::value.

template<int n1, int n2>
struct static_max
{
    static const int value = (n1 > n2)? n1 : n2;
};

We are also going to need to select a data type based upon a constant expression. Another template struct will serve this purpose, this time with three parameters; the result of the constant expression yielding a Boolean condition, and the two types to be selected if the condition is true or false, respectively. The structure is declared, and then partially specialized for each condition.

template<bool cond, typename T1, typename T2>
struct select_type;

template<typename T1, typename T2>
struct select_type<true, T1, T2> {
     typedef T1 type;
};

template<typename T1, typename T2>
struct select_type<false, T1, T2> {
    typedef T2 type;
};

Now, the expression typename select_type<true, signed int, unsigned int>::type will yield a data type of signed int.

Next, we need a traits mechanism to be able to generically access the signed or unsigned form of a data type. For simplicity, we can define a template base class to provide the type definitions from template parameters:

template<typename Signed, typename Unsigned>
struct sign_traits_base {
    typedef Signed   signed_type;
    typedef Unsigned unsigned_type;
};

and then declare a sign_traits template structure and specialize it for each signed and unsigned variant of each fundamental data type.

template<typename T>
struct sign_traits;

template<> struct sign_traits<char>           : sign_traits_base<char, unsigned char> { };
template<> struct sign_traits<signed char>    : sign_traits_base<char, unsigned char> { };
template<> struct sign_traits<unsigned char>  : sign_traits_base<char, unsigned char> { };
template<> struct sign_traits<signed short>   : sign_traits_base<short, unsigned short> { };
template<> struct sign_traits<unsigned short> : sign_traits_base<short, unsigned short> { };
template<> struct sign_traits<signed int>     : sign_traits_base<int, unsigned int> { };
template<> struct sign_traits<unsigned int>   : sign_traits_base<int, unsigned int> { };
template<> struct sign_traits<signed long>    : sign_traits_base<long, unsigned long> { };
template<> struct sign_traits<unsigned long>  : sign_traits_base<long, unsigned long> { };

Now the expression typename sign_traits<char>::unsigned_type will yield a data type of unsigned char. One final prerequisite is to provide a means to access a data type capable of storing a value of a specific size. We can do this by defining an integer parameter template struct and specialize it for each size of a fundamental type, defining the signed and unsigned type appropriately.

template<int n>
struct types_from_size;

template<> struct types_from_size<sizeof(char)>      : sign_traits_base<char, unsigned char> { };
template<> struct types_from_size<sizeof(short)>     : sign_traits_base<short, unsigned short> { };
template<> struct types_from_size<sizeof(int)>       : sign_traits_base<int, unsigned int> { };
template<> struct types_from_size<sizeof(long long)> : sign_traits_base<long long, unsigned long long>
{ };

Now the expression typename types_from_size<sizeof(int)>::unsigned_type will yield a data type of unsigned int.

Implementation

numcmp is a simple inline method that is provided for ease of use, and to leverage the power of the template argument deduction from the compiler. The complete implementation is:

template<typename T1, typename T2>
inline
int numcmp(T1 t1, T2 t2)
{
    return detail::comparer<T1, T2>()(t1, t2);
}

Inside the function, we simply instantiate a comparer structure with the same template parameters as numcmp, and immediately return the result of the function call operator.

The comparer structure is where all the work is done. This structure takes three template parameters. The first two, T1 and T2, are the argument types for the function call, and the third is a Boolean value. The value of the third parameter indicates where or not the two types are both signed or both unsigned, regardless of their actual data type. For example, if T1 is unsigned char and T2 is unsigned int, then this parameter will be true, and if T1 is unsigned char and T2 is signed char, then the parameter will be false. This parameter is calculated automatically from the other two parameters and is never specified explicitly. To perform the comparison of types, we can use the Boost Type Traits library [1] or C++11 type traits, specifically the is_same template.

template<typename T1,
         typename T2,
         bool sign_match = (
             std::is_same<typename sign_traits<T1>::signed_type, T1>::value
             &&
             std::is_same<typename sign_traits<T2>::signed_type, T2>::value
         ) || (
             std::is_same<typename sign_traits<T1>::unsigned_type, T1>::value
             &&
             std::is_same<typename sign_traits<T2>::unsigned_type, T2>::value
         )>
struct comparer;

The definition is verbose, but it is actually just a simple constant Boolean expression. The expression std::is_same<T1,T2>::value will evaluate to true if T1 and T2 are the same type, otherwise to false, so the sign_match argument is true if T1 and T2 are both signed, or if T1 and T2 are both unsigned, otherwise, sign_match is false.

The comparer structure is implemented in two partial specializations; one each for a true and false value of sign_match. Firstly, let us consider the case where the signs of each data type match; that is, both are either signed or unsigned data types. The values t1 and t2 are both promoted to the largest type of T1 and T2 and the comparison is made. The only complication is the selection of the data type, the algorithm for which is

if T1 and T2 are the same size
    if T1 and T2 are both signed types
        select the signed form of data type T1
    else
        select the unsigned form of data type T1
    endif
else if T1 and T2 are both signed types
    select the signed form of the data type that can store the largest of T1 and T2
else
    select the unsigned form of the data type that can store the largest of T1 and T2
endif

This type selection is evaluated compile time using meta-programming techniques, and is implemented as

detail::select_type<
    sizeof(T1)==sizeof(T2),
    detail::select_type<
        std::is_same<T1, typename sign_traits<T1>::signed_type>::value,
        typename detail::sign_traits<T1>::signed_type,
        typename detail::sign_traits<T1>::unsigned_type
    >::type,
    detail::select_type<
        std::is_same<T1, typename sign_traits<T1>::signed_type>::value,
        typename detail::types_from_size<
            detail::static_max< sizeof(T1), sizeof(T2)>::value>::signed_type,
        typename detail::types_from_size<
            detail::static_max< sizeof(T1), sizeof(T2)>::value >::unsigned_type
    >::type
>::type

Some small optimisations can be made using constant integers to store results to reduce the compilation time, and is shown in the implementation of comparer<T1, T2, true> in Listing 1.

The algorithm for the second partial specialization of comparer where one data type is signed and the other is unsigned is

if T1 is a signed data type and t1 <0
    return -1
else if T2 is a signed data type and t2<0
    return 1
else
    promote t1 and t2 to the unsigned type of the larger of T1 and T2
    re-evaluate the comparison with the new types using comparer<unsigned_type, unsigned_type>
endif

The first two conditions implement the key to the technique. Here, one data type is known to be unsigned and the other is known to be signed. Using this knowledge, the value of the signed data type variable is evaluated, and if it is a negative value, then we know that it must be less than an unsigned value, so return the result appropriately. If T1 is the signed type and t1 is negative, then -1 is returned as t1 < t2 holds. If T2 is the signed type and t2 is negative, then 1 is returned as t2 > t1 holds. The third case is where the signs of the data type mismatch but they both hold positive values. In this case, it is safe to promote the signed value to an unsigned value and the comparison is re-evaluated by invoking the first specialization with two unsigned data types. The full implementation of this partial specialization is shown in Listing 2.

Results

The simple test program in Listing 3 was used to verify the algorithms. The output from the test program is shown below. Each test condition results in four lines of output:

Tests 1 to 7 ensure that sign matching type comparison is unchanged from native evaluation results. Tests 8 to 9 test mixed sign integer comparison, and show the new functionality where a signed integer is evaluated not to be less than an unsigned integer by the language rules, but the numcmp result evaluates it to be true. Finally, tests 10 and 11 verify mixed sign and data type comparison by evaluating a signed short in relation to an unsigned integer. Again, the language rules report the negative value as being greater than the unsigned value, but numcmp correctly reports the relationship of the values themselves without regard to their data representation.

Test 1: -1, 20
-1 < 20           = true
Comparer: less    = true
Comparer: equal   = false
Comparer: greater = false

Test 2: 20, -1
20 < -1           = false
Comparer: less    = false
Comparer: equal   = false
Comparer: greater = true

Test 3: 1, 20U
1 < 20U           = true
Comparer: less    = true
Comparer: equal   = false
Comparer: greater = false

Test 4: 20U, 1
20U < 1           = false
Comparer: less    = false
Comparer: equal   = false
Comparer: greater = true

Test 5: 20, 20U
20 < 20U          = false
Comparer: less    = false
Comparer: equal   = true
Comparer: greater = false

Test 6: 20U, 20U
20U < 20U         = false
Comparer: less    = false
Comparer: equal   = true
Comparer: greater = false

Test 7: -20, -1
-20 < -1          = true
Comparer: less    = true
Comparer: equal   = false
Comparer: greater = false

Test 8: -1, 20U
-1 < 20U          = false
Comparer: less    = true
Comparer: equal   = false
Comparer: greater = false

Test 9: 20U, -1
20U < -1          = true
Comparer: less    = false
Comparer: equal   = false
Comparer: greater = true

Test 10: short(-1), 20U
short(-1) < 20U   = false
Comparer: less    = true
Comparer: equal   = false
Comparer: greater = false

Test 11: 20U, short(-1)
20U < short(-1)   = true
Comparer: less    = false
Comparer: equal   = false
Comparer: greater = true

Listings

Listing 1 - partial specialization for signed/signed and unsigned/ unsigned data types

template<typename T1, typename T2>
struct comparer<T1, T2, true>
{
    int operator()(T1 t1, T2 t2) const
    {
        static const int
            is_signed = std::is_same<T1, typename sign_traits<T1>::signed_type>::value,
            max_size  = detail::static_max<sizeof(T1), sizeof(T2)>::value;
        
 
        typedef
        detail::select_type<sizeof(T1)==sizeof(T2),
                            detail::select_type<
                                is_signed,
                                typename detail::sign_traits<T1>::signed_type,
                                typename detail::sign_traits<T1>::unsigned_type
                            >::type,
                            detail::select_type<
                                is_signed,
                                typename detail::types_from_size<max_size>::signed_type,
                                typename detail::types_from_size<max_size>::unsigned_type
                            >::type
        >::type
        type;
 
        type first  = t1;
        type second = t2;

        return (first < second) ? -1 : (first == second)? 0 : 1;
    }
};

Listing 2 - partial specialization for sign/unsigned data types

// partial specialisation for a mismatch of signed and unsigned types
template<typename T1, typename T2>
struct comparer<T1, T2, false>
{
    int operator()(T1 t1, T2 t2) const
    {
        if (std::is_same<T1, typename sign_traits<T1>::signed_type>::value  &&  t1 < 0)
            return -1;
        else if (std::is_same<T2, typename sign_traits<T2>::signed_type>::value &&  t2 < 0)
            return 1;
        else
        {
            typedef
            typename detail::types_from_size<
                detail::static_max<sizeof(T1),
				sizeof(T2)>::value
			>::unsigned_type
            unsigned_type;
        
            // we are now safe to cast as both values are positive, and small enough
            // to be stored in the unsigned_type type
            unsigned_type first  = static_cast<unsigned_type>(t1);
            unsigned_type second = static_cast<unsigned_type>(t2);
            return comparer<unsigned_type, unsigned_type>()(first, second);
        }
    }
};

Listing 3 - Test Program

#define test(a,b)                                                            \
    std::cout << "Test " << ++count << ": " #a ", " #b                       \
    << "\n" #a " < " #b " = " << ((a)<(b)?"true":"false")                    \
    << "\n  Comparer: less    = " << ((numcmp<>(a,b) < 0) ? "true":"false")  \
    << "\n  Comparer: equal   = " << ((numcmp<>(a,b) == 0)? "true":"false")  \
    << "\n  Comparer: greater = " << ((numcmp<>(a,b) > 0) ? "true":"false")  \
    << "\n\n";

int main(int, char**)
{
    int count = 0;
 
    test(-1, 20);
    test(20, -1);
    test(1, 20U);
    test(20U, 1);
    test(20, 20U);
    test(20U, 20U);
    test(-20, -1);
 
    test(-1, 20U);
    test(20U, -1);
 
    test(short(-1), 20U);
    test(20U, short(-1));
 
    return 0;
}

References

[1] http://www.boost.org/libs/type_traits/