MATLAB results may vary depending on CPU

Posted on September 7, 2015

One project that I worked on at Metron involved prototyping a sonar processing system using MATLAB. I ran into a curious issue when sharing my code with a colleague: while my program ran successfully on my computer, it failed on my colleague’s computer due to an attempt to compute the Cholesky decomposition of a matrix that was not positive-definite.

Why was this matrix positive-definite on my computer but not my colleague’s? The offending matrix Sigma was defined as

Sigma = eye(N) + v * v'

, where v was a complex-valued column vector of length N.

Curiously, entries along the diagonal of Sigma were not exactly real-valued for my colleague, instead having imaginary components orders of magnitude smaller than the real components (being almost real-valued was not sufficient to satisfy MATLAB’s chol function). This meant that multiplying a complex number by its conjugate did not necessarily result in a real value (while on my computer, the result was necessarily real).

Given that a complex floating-point number $x$ is represented as $x = a + b i$ , where $a$ and $b$ are each floating-point values, then $x$ times its conjugate is $x x^* = (a^2 + b^2) + (ab - ba) i.$ Here we see the crux of the issue: for my colleague, $ab - ba \ne 0$ .

Those familiar with the IEEE floating point standard know that it (implicitly) specifies that multiplication should be commutative. It turns out that MATLAB does not strictly follow IEEE, instead choosing to enable IEEE-breaking optimizations for itself and its dependencies (e.g., BLAS and LAPACK). Included in these operations is the use of the fused multiply add (FMA) operation, which is a ternary operation which computes

FMA(x, y, z) = x * y + z

in one fell swoop, without any intermediate rounding. There are two potential advantages to using the FMA operation:

more precise computations are possible since there is no intermediate rounding
the operation can be implemented on a CPU to take the same amount of time as either an individual add or multiply operation, making it twice as fast as computing the result using a multiply and an add

For the imaginary component in our case, the computation was thus

FMA(a, b, -(b * a))

, which is not necessarily equal to zero, because the computation of b * a in the third argument to the FMA operation is rounded before being subtracted from a * b, which is not rounded. The result is very small but potentially nonzero.

My colleague had an Intel Haswell processor, which supports the FMA operation, causing him to run into this issue. My processor was older and did not support the FMA operation, so instead implemented the computation in strict IEEE fashion, meaning that I was guaranteed to have a result of zero. For this reason, MATLAB results may vary depending on the CPU.

To remedy the issue, we added extra operations to set the imaginary components of the diagonals of these troublesome matrices to zero. We empirically found that this seemed to work. Technically, we probably should have instead computed

Sigma = eye(N) + v * v' / 2 + conj(conj(v) * conj(v')) / 2

to ensure that the entire matrix would be identical on both CPUs.

Ironically, in situations such as this, the addition of a floating point operation designed to increase the speed and precision of computation improves neither, forcing you to sacrifice one in order to merely preserve the other.