John R. Hauser
2017 August 18
1. Introduction 2. Limitations 3. Acknowledgments and License 4. What TestFloat Does 5. Executing TestFloat 6. Operations Tested by TestFloat 6.1. Conversion Operations 6.2. Basic Arithmetic Operations 6.3. Fused Multiply-Add Operations 6.4. Remainder Operations 6.5. Round-to-Integer Operations 6.6. Comparison Operations 7. Interpreting TestFloat Output 8. Variations Allowed by the IEEE Floating-Point Standard 8.1. Underflow 8.2. NaNs 8.3. Conversions to Integer 9. Contact Information
Berkeley TestFloat is a small collection of programs for testing that an
implementation of binary floating-point conforms to the IEEE Standard for
Floating-Point Arithmetic.
All operations required by the original 1985 version of the IEEE Floating-Point
Standard can be tested, except for conversions to and from decimal.
With the current release, the following binary formats can be tested:
Included in the TestFloat package are the testsoftfloat
and
timesoftfloat
programs for testing the Berkeley SoftFloat software
implementation of floating-point and for measuring its speed.
Information about SoftFloat can be found at the SoftFloat Web page,
http://www.jhauser.us/arithmetic/SoftFloat.html
testsoftfloat
and timesoftfloat
programs are
expected to be of interest only to people compiling the SoftFloat sources.
This document explains how to use the TestFloat programs. It does not attempt to define or explain much of the IEEE Floating-Point Standard. Details about the standard are available elsewhere.
The current version of TestFloat is sqrtf
, sqrtl
,
fmaf
, fma
, and fmal
.
Compared to Release 2c and earlier, the set of TestFloat programs, as well as
the programs’ arguments and behavior, changed some with
TestFloat-history.html
TestFloat output is not always easily interpreted. Detailed knowledge of the IEEE Floating-Point Standard and its vagaries is needed to use TestFloat responsibly.
TestFloat performs relatively simple tests designed to check the fundamental
soundness of the floating-point under test.
TestFloat may also at times manage to find rarer and more subtle bugs, but it
will probably only find such bugs by chance.
Software that purposefully seeks out various kinds of subtle floating-point
bugs can be found through links posted on the TestFloat Web page,
http://www.jhauser.us/arithmetic/TestFloat.html
The TestFloat package was written by me,
Par Lab: Microsoft (Award #024263), Intel (Award #024894), and U.C. Discovery (Award #DIG07-10227), with additional support from Par Lab affiliates Nokia, NVIDIA, Oracle, and Samsung. ASPIRE Lab: DARPA PERFECT program (Award #HR0011-12-2-0016), with additional support from ASPIRE industrial sponsor Intel and ASPIRE affiliates Google, Nokia, NVIDIA, Oracle, and Samsung.
The following applies to the whole of TestFloat
Copyright 2011, 2012, 2013, 2014, 2015, 2016, 2017 The Regents of the University of California. All rights reserved.
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
Redistributions of source code must retain the above copyright notice, this list of conditions, and the following disclaimer.
Redistributions in binary form must reproduce the above copyright notice, this list of conditions, and the following disclaimer in the documentation and/or other materials provided with the distribution.
Neither the name of the University nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS “AS IS”, AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE, ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
TestFloat is designed to test a floating-point implementation by comparing its behavior with that of TestFloat’s own internal floating-point implemented in software. For each operation to be tested, the TestFloat programs can generate a large number of test cases, made up of simple pattern tests intermixed with weighted random inputs. The cases generated should be adequate for testing carry chain propagations, and the rounding of addition, subtraction, multiplication, and simple operations like conversions. TestFloat makes a point of checking all boundary cases of the arithmetic, including underflows, overflows, invalid operations, subnormal inputs, zeros (positive and negative), infinities, and NaNs. For the interesting operations like addition and multiplication, millions of test cases may be checked.
TestFloat is not remarkably good at testing difficult rounding cases for
division and square root.
It also makes no attempt to find bugs specific to SRT division and the like
(such as the infamous Pentium division bug).
Software that tests for such failures can be found through links on the
TestFloat Web page,
http://www.jhauser.us/arithmetic/TestFloat.html
NOTE!
It is the responsibility of the user to verify that the discrepancies TestFloat
finds actually represent faults in the implementation being tested.
Advice to help with this task is provided later in this document.
Furthermore, even if TestFloat finds no fault with a floating-point
implementation, that in no way guarantees that the implementation is bug-free.
For each operation, TestFloat can test all five rounding modes defined by the IEEE Floating-Point Standard, plus possibly a sixth mode, round to odd (depending on the options selected when TestFloat was built). TestFloat verifies not only that the numeric results of an operation are correct, but also that the proper floating-point exception flags are raised. All five exception flags are tested, including the inexact flag. TestFloat does not attempt to verify that the floating-point exception flags are actually implemented as sticky flags.
For the
As a rule, TestFloat is not particular about the bit patterns of NaNs that
appear as operation results.
Any NaN is considered as good a result as another.
This laxness can be overridden so that TestFloat checks for particular bit
patterns within NaN results.
See -checkNaNs
option documented
for programs testfloat_ver
and testfloat
.
TestFloat normally compares an implementation of floating-point against the
Berkeley SoftFloat software implementation of floating-point, also created by
me.
The SoftFloat functions are linked into each TestFloat program’s
executable.
Information about SoftFloat can be found at the Web page
http://www.jhauser.us/arithmetic/SoftFloat.html
For testing SoftFloat itself, the TestFloat package includes a
testsoftfloat
program that compares SoftFloat’s
floating-point against another software floating-point implementation.
The second software floating-point is simpler and slower than SoftFloat, and is
completely independent of SoftFloat.
Although the second software floating-point cannot be guaranteed to be
bug-free, the chance that it would mimic any of SoftFloat’s bugs is low.
Consequently, an error in one or the other floating-point version should appear
as an unexpected difference between the two implementations.
Note that testing SoftFloat should be necessary only when compiling a new
TestFloat executable or when compiling SoftFloat for some other reason.
The TestFloat package consists of five programs, all intended to be executed from a command-line interpreter:
Each program has its own page of documentation that can be opened through the links in the table above.
testfloat_gen
Generates test cases for a specific floating-point operation. testfloat_ver
Verifies whether the results from executing a floating-point operation are as expected. testfloat
An all-in-one program that generates test cases, executes floating-point operations, and verifies whether the results match expectations. testsoftfloat
Like testfloat
, but for testing SoftFloat.timesoftfloat
A program for measuring the speed of SoftFloat (included in the TestFloat package for convenience).
To test a floating-point implementation other than SoftFloat, one of three
different methods can be used.
The first method pipes output from testfloat_gen
to a program
that:
testfloat_ver
to be checked for
correctness.
Assuming a vertical bar (|
) indicates a pipe between programs, the
complete process could be written as a single command like so:
The program in the middle is not supplied by TestFloat but must be created independently. If for some reason this program cannot take command-line arguments, thetestfloat_gen ... <type> | <program-that-invokes-op> | testfloat_ver ... <function>
-prefix
option of testfloat_gen
can communicate
parameters through the pipe.
A second method for running TestFloat is similar but has
testfloat_gen
supply not only the test inputs but also the
expected results for each case.
With this additional information, the job done by testfloat_ver
can be folded into the invoking program to give the following command:
Again, the program that actually invokes the floating-point operation is not supplied by TestFloat but must be created independently. Depending on circumstance, it may be preferable either to lettestfloat_gen ... <function> | <program-that-invokes-op-and-compares-results>
testfloat_ver
check and report suspected errors (first method) or
to include this step in the invoking program (second method).
The third way to use TestFloat is the all-in-one testfloat
program.
This program can perform all the steps of creating test cases, invoking the
floating-point operation, checking the results, and reporting suspected errors.
However, for this to be possible, testfloat
must be compiled to
contain the method for invoking the floating-point operations to test.
Each build of testfloat
is therefore capable of testing
only the floating-point implementation it was built to invoke.
To test a new implementation of floating-point, a new testfloat
must be created, linked to that specific implementation.
By comparison, the testfloat_gen
and testfloat_ver
programs are entirely generic;
one instance is usable for testing any floating-point implementation, because
implementation-specific details are segregated in the custom program that
follows testfloat_gen
.
Program testsoftfloat
is another all-in-one program specifically
for testing SoftFloat.
Programs testfloat_ver
, testfloat
, and
testsoftfloat
all report status and error information in a common
way.
As it executes, each of these programs writes status information to the
standard error output, which should be the screen by default.
In order for this status to be displayed properly, the standard error stream
should not be redirected to a file.
Any discrepancies that are found are written to the standard output stream,
which is easily redirected to a file if desired.
Unless redirected, reported errors will appear intermixed with the ongoing
status information in the output.
TestFloat can test all operations required by the original 1985 IEEE Floating-Point Standard except for conversions to and from decimal. These operations are:
More information about all these operations is given below.
In the operation names used by TestFloat, f16
, f32
, f64
,
extF80
, and
f128
.
TestFloat generally uses the same names for operations as Berkeley SoftFloat,
except that TestFloat’s names never include the M
that
SoftFloat uses to indicate that values are passed through pointers.
All conversions among the floating-point formats and all conversions between a
floating-point format and
Abbreviationsui32_to_f16 ui64_to_f16 i32_to_f16 i64_to_f16 ui32_to_f32 ui64_to_f32 i32_to_f32 i64_to_f32 ui32_to_f64 ui64_to_f64 i32_to_f64 i64_to_f64 ui32_to_extF80 ui64_to_extF80 i32_to_extF80 i64_to_extF80 ui32_to_f128 ui64_to_f128 i32_to_f128 i64_to_f128 f16_to_ui32 f32_to_ui32 f64_to_ui32 extF80_to_ui32 f128_to_ui32 f16_to_ui64 f32_to_ui64 f64_to_ui64 extF80_to_ui64 f128_to_ui64 f16_to_i32 f32_to_i32 f64_to_i32 extF80_to_i32 f128_to_i32 f16_to_i64 f32_to_i64 f64_to_i64 extF80_to_i64 f128_to_i64 f16_to_f32 f32_to_f16 f64_to_f16 extF80_to_f16 f128_to_f16 f16_to_f64 f32_to_f64 f64_to_f32 extF80_to_f32 f128_to_f32 f16_to_extF80 f32_to_extF80 f64_to_extF80 extF80_to_f64 f128_to_f64 f16_to_f128 f32_to_f128 f64_to_f128 extF80_to_f128 f128_to_extF80
ui32
and ui64
indicate
i32
and i64
indicate their signed counterparts.
These conversions all round according to the current rounding mode as relevant.
Conversions from a smaller to a larger floating-point format are always exact
and so require no rounding.
Likewise, conversions from
For the all-in-one testfloat
program, this list of conversion
operations requires amendment.
For testfloat
only, conversions to an integer type have names that
explicitly specify the rounding mode and treatment of inexactness.
Thus, instead of
as listed above, operations converting to integer type have names of these forms:<float>_to_<int>
The<float>_to_<int>_r_<round> <float>_to_<int>_rx_<round>
<round>
component is one of
‘near_even
’, ‘near_maxMag
’,
‘minMag
’, ‘min
’, or
‘max
’, choosing the rounding mode.
Any other indication of rounding mode is ignored.
The operations with ‘_r_
’ in their names never raise
the inexact exception, while those with ‘_rx_
’
raise the inexact exception whenever the result is not exact.
TestFloat assumes that conversions from floating-point to an integer type should raise the invalid exception if the input cannot be rounded to an integer representable in the result format. In such a circumstance:
If the result type is an unsigned integer, TestFloat normally expects the result of the operation to be the type’s largest integer value. In the case that the input is a negative number (not a NaN), a zero result may also be accepted.
If the result type is a signed integer and the input is a number (not a NaN), TestFloat expects the result to be the largest-magnitude integer with the same sign as the input. When a NaN is converted to a signed integer type, TestFloat allows either the largest postive or largest-magnitude negative integer to be returned.
When converting to an integer, if the rounding mode is odd
(possible only when the rounding mode is not in the function name), TestFloat
expects the result to be rounded not to an odd integer but rather to
minimum magnitude, the same as when the rounding mode is minMag
.
The following standard arithmetic operations can be tested:
The double-extended-precision (f16_add f16_sub f16_mul f16_div f16_sqrt f32_add f32_sub f32_mul f32_div f32_sqrt f64_add f64_sub f64_mul f64_div f64_sqrt extF80_add extF80_sub extF80_mul extF80_div extF80_sqrt f128_add f128_sub f128_mul f128_div f128_sqrt
extF80
) operations can be rounded
to reduced precision under rounding precision control.
For all floating-point formats except
f16_mulAdd f32_mulAdd f64_mulAdd f128_mulAdd
If one of the multiplication operands is infinite and the other is zero, TestFloat expects the fused multiply-add operation to raise the invalid exception even if the third operand is a quiet NaN.
For each format, TestFloat can test the IEEE Standard’s remainder operation. These operations are:
The remainder operations are always exact and so require no rounding.f16_rem f32_rem f64_rem extF80_rem f128_rem
For each format, TestFloat can test the IEEE Standard’s round-to-integer operation. For most TestFloat programs, these operations are:
f16_roundToInt f32_roundToInt f64_roundToInt extF80_roundToInt f128_roundToInt
Just as for conversions to integer types (testfloat
program is again an exception.
For testfloat
only, the round-to-integer operations have names of
these forms:
For the ‘<float>_roundToInt_r_<round> <float>_roundToInt_x
_r_
’ versions, the inexact exception
is never raised, and the <round>
component specifies
the rounding mode as one of ‘near_even
’,
‘near_maxMag
’, ‘minMag
’,
‘min
’, or ‘max
’.
The usual indication of rounding mode is ignored.
In contrast, the ‘_x
’ versions accept the usual
indication of rounding mode and raise the inexact exception whenever the
result is not exact.
This irregular system follows the IEEE Standard’s particular
specification for the round-to-integer operations.
If the rounding mode is odd
(possible only when the rounding mode
is not in the function name), TestFloat expects the result to be rounded
not to an odd integer but rather to minimum magnitude, the same as
when the rounding mode is minMag
.
The following floating-point comparison operations can be tested:
The abbreviationf16_eq f16_le f16_lt f32_eq f32_le f32_lt f64_eq f64_le f64_lt extF80_eq extF80_le extF80_lt f128_eq f128_le f128_lt
eq
stands for “equal” (=),
le
stands for “less than or equal” (≤), and
lt
stands for “less than” (<).
The IEEE Standard specifies that, by default, the less-than-or-equal and less-than comparisons raise the invalid exception if either input is any kind of NaN. The equality comparisons, on the other hand, are defined by default to raise the invalid exception only for signaling NaNs, not for quiet NaNs. For completeness, the following additional operations can be tested if supported:
Thef16_eq_signaling f16_le_quiet f16_lt_quiet f32_eq_signaling f32_le_quiet f32_lt_quiet f64_eq_signaling f64_le_quiet f64_lt_quiet extF80_eq_signaling extF80_le_quiet extF80_lt_quiet f128_eq_signaling f128_le_quiet f128_lt_quiet
signaling
equality comparisons are identical to the standard
operations except that the invalid exception should be raised for any
NaN input.
Similarly, the quiet
comparison operations should be identical to
their counterparts except that the invalid exception is not raised for
quiet NaNs.
Obviously, no comparison operations ever require rounding. Any rounding mode is ignored.
The “errors” reported by TestFloat programs may or may not really represent errors in the system being tested. For each test case tried, the results from the floating-point implementation being tested could differ from the expected results for several reasons:
For each reported error (or apparent error), a line of text is written to the default output. If a line would be longer than 79 characters, it is divided. The first part of each error line begins in the leftmost column, and any subsequent “continuation” lines are indented with a tab.
Each error reported is of the form:
The<inputs> => <observed-output> expected: <expected-output>
<inputs>
are the inputs to the operation.
Each output (observed or expected) is shown as a pair: the result value first,
followed by the exception flags.
For example, two typical error lines could be
In the first line, the inputs are-00.7FFF00 -7F.000100 => +01.000000 ...ux expected: +01.000000 ....x +81.000004 +00.1FFFFF => +01.000000 ...ux expected: +01.000000 ....x
-00.7FFF00
and
-7F.000100
, and the observed result is +01.000000
with flags ...ux
.
The trusted emulation result is the same but with different flags,
....x
.
Items such as -00.7FFF00
composed of a sign character
+
/-
)
Aside from the exception flags, there are ten data types that may be
represented.
Five are floating-point types: 0
(false) or a 1
(true).
A FFFFFFFF
is
−1, and 7FFFFFFF
is the largest positive value.
Floating-point values are written decomposed into their sign, encoded exponent,
and encoded significand.
First is the sign character +
or -
),.
), and lastly the encoded significand in hexadecimal.
For
Certain categories are easily distinguished (assuming the
+00.000
+0 +0F.000
1 +10.000
2 +1E.3FF
maximum finite value +1F.000
+infinity -00.000
−0 -0F.000
−1 -10.000
−2 -1E.3FF
minimum finite value (largest magnitude, but negative) -1F.000
−infinity
x
s are
not all 0):
+00.xxx
positive subnormal numbers +1F.xxx
positive NaNs -00.xxx
negative subnormal numbers -1F.xxx
negative NaNs
Likewise for other formats:
32-bit single 64-bit double 128-bit quadruple +00.000000
+000.0000000000000
+0000.0000000000000000000000000000
+0 +7F.000000
+3FF.0000000000000
+3FFF.0000000000000000000000000000
1 +80.000000
+400.0000000000000
+4000.0000000000000000000000000000
2 +FE.7FFFFF
+7FE.FFFFFFFFFFFFF
+7FFE.FFFFFFFFFFFFFFFFFFFFFFFFFFFF
maximum finite value +FF.000000
+7FF.0000000000000
+7FFF.0000000000000000000000000000
+infinity -00.000000
-000.0000000000000
-0000.0000000000000000000000000000
−0 -7F.000000
-3FF.0000000000000
-3FFF.0000000000000000000000000000
−1 -80.000000
-400.0000000000000
-4000.0000000000000000000000000000
−2 -FE.7FFFFF
-7FE.FFFFFFFFFFFFF
-7FFE.FFFFFFFFFFFFFFFFFFFFFFFFFFFF
minimum finite value -FF.000000
-7FF.0000000000000
-7FFF.0000000000000000000000000000
−infinity +00.xxxxxx
+000.xxxxxxxxxxxxx
+0000.xxxxxxxxxxxxxxxxxxxxxxxxxxxx
positive subnormals +FF.xxxxxx
+7FF.xxxxxxxxxxxxx
+7FFF.xxxxxxxxxxxxxxxxxxxxxxxxxxxx
positive NaNs -00.xxxxxx
-000.xxxxxxxxxxxxx
-0000.xxxxxxxxxxxxxxxxxxxxxxxxxxxx
negative subnormals -FF.xxxxxx
-7FF.xxxxxxxxxxxxx
-7FFF.xxxxxxxxxxxxxxxxxxxxxxxxxxxx
negative NaNs
The 8
digit in
the significands):
+0000.0000000000000000
+0 +3FFF.8000000000000000
1 +4000.8000000000000000
2 +7FFE.FFFFFFFFFFFFFFFF
maximum finite value +7FFF.8000000000000000
+infinity -0000.0000000000000000
−0 -3FFF.8000000000000000
−1 -4000.8000000000000000
−2 -7FFE.FFFFFFFFFFFFFFFF
minimum finite value -7FFF.8000000000000000
−infinity
Lastly, exception flag values are represented by five characters, one character
per flag.
Each flag is written as either a letter or a period (.
) according
to whether the flag was set or not by the operation.
A period indicates the flag was not set.
The letter used to indicate a set flag depends on the flag:
For example, the notation
v
invalid exception i
infinite exception (“divide by zero”) o
overflow exception u
underflow exception x
inexact exception
...ux
indicates that the
underflow and inexact exception flags were set and that the other
three flags (invalid, infinite, and overflow) were not
set.
The exception flags are always written following the value returned as the
result of the operation.
The IEEE Floating-Point Standard admits some variation among conforming implementations. Because TestFloat expects the two implementations being compared to deliver bit-for-bit identical results under most circumstances, this leeway in the standard can result in false errors being reported if the two implementations do not make the same choices everywhere the standard provides an option.
The standard specifies that the underflow exception flag is to be raised
when two conditions are met simultaneously:
A result is tiny when its magnitude is nonzero yet smaller than any normalized
floating-point number.
The standard allows tininess to be determined either before or after a result
is rounded to the destination precision.
If tininess is detected before rounding, some borderline cases will be flagged
as underflows even though the result after rounding actually lies within the
normal floating-point range.
By detecting tininess after rounding, a system can avoid some unnecessary
signaling of underflow.
All the TestFloat programs support options -tininessbefore
and
-tininessafter
to control whether TestFloat expects tininess on
underflow to be detected before or after rounding.
One or the other is selected as the default when TestFloat is compiled, but
these command options allow the default to be overridden.
Loss of accuracy occurs when the subnormal format is not sufficient to represent an underflowed result accurately. The original 1985 version of the IEEE Standard allowed loss of accuracy to be detected either as an inexact result or as a denormalization loss; however, few if any systems ever chose the latter. The latest standard requires that loss of accuracy be detected as an inexact result, and TestFloat can test only for this case.
The IEEE Standard gives the floating-point formats a large number of NaN encodings and specifies that NaNs are to be returned as results under certain conditions. However, the standard allows an implementation almost complete freedom over which NaN to return in each situation.
By default, TestFloat does not check the bit patterns of NaN results.
When the result of an operation should be a NaN, any NaN is considered as good
as another.
This laxness can be overridden with the -checkNaNs
option of
programs testfloat_ver
and testfloat
.
In order for this option to be sensible, TestFloat must have been compiled so
that its internal floating-point implementation (SoftFloat) generates the
proper NaN results for the system being tested.
Conversion of a floating-point value to an integer format will fail if the source value is a NaN or if it is too large. The IEEE Standard does not specify what value should be returned as the integer result in these cases. Moreover, according to the standard, the invalid exception can be raised or an unspecified alternative mechanism may be used to signal such cases.
TestFloat assumes that conversions to integer will raise the invalid
exception if the source value cannot be rounded to a representable integer.
In such cases, TestFloat expects the result value to be the largest-magnitude
positive or negative integer or zero, as detailed earlier in
At the time of this writing, the most up-to-date information about TestFloat
and the latest release can be found at the Web page
http://www.jhauser.us/arithmetic/TestFloat.html