The compiler must on occasion insert conversion instructions, introducing additional execution cycles. This is the case for
The latter case can be avoided by using single-precision floating-point constants, defined with an f suffix such as 3.141592653589793f, 1.0f, 0.5f. This suffix has accuracy implications in addition to its ramifications on performance. The effects on accuracy are discussed in ../chapters/chapter7.html. Note that this distinction is particularly important to performance on devices of compute capability 2.x.
For single-precision code, use of the float type and the single-precision math functions are highly recommended. When compiling for devices without native double-precision support such as devices of compute capability 1.2 and earlier, each double-precision floating-point variable is converted to single-precision floating-point format (but retains its size of 64 bits) and double-precision arithmetic is demoted to single-precision arithmetic.
It should also be noted that the CUDA math library’s complementary error function, erfcf(), is particularly fast with full single-precision accuracy.