Quadruple joins the fight!

This float-128 implementation beats others at same precision

4 minute read

A few days ago I wrote about benchmarking arbitrary precision floating-point libraries in Java. I found out that BigDecimal is not as slow as it is said to be, beating Apfloat at the same precision level by a long margin in most operations. However, for Gaia Sky, I don’t need hundreds of significant digits at all. It turns out 27 significant digits are enough to represent the whole universe with a precision of 1 meter.

The observable universe has a radius of about \(4.4 \times 10^{26}\) meters. To express the entire range down to 1 meter, we need to calculate the number of significant digits \(d\) as follows:

$$ \begin{align} d &= \log_{10} \left(\frac{R}{\text{precision}}\right) \\ \\ &= \log_{10} \left(\frac{4.4 \times 10^{26}}{1}\right) \\ \\ &= \log_{10}(4.4 \times 10^{26}) \\ \\ &= \log_{10}(4.4) + \log_{10}(10^{26}) \\ \\ \approx 0.643 + 26 &= 26.643 \end{align} $$

In terms of bits, IEEE 754 double precision (64-bit) provides around 15–17 decimal digits of precision, which is enough for the Solar System, but insufficient for the whole universe. In contrast, IEEE 754 quadruple precision (128-bit) provides around 34 decimal digits of precision, which is in fact more than enough. It uses 113 bits of significand precision, \(log_{10}(2^{113}) \approx 34\) digits. The range of values we can precisely represent in the universe is \(\approx \frac{4.4 \times 10^{26}}{10^{34}} = 4.4 \times 10^{-8}\) meters. This is 4.4 nanometers! As said, this is more than sufficient for our purposes.

Enter Quadruple

Browsing through GitHub I found the Quadruple library, which provides an implementation of 128-bit floating point numbers in Java. The implementation is very compact, and includes addition, subtraction, multiplication, division, and square root. I decided to put it to the test using my JMH benchmark.

I created a new benchmark called “ThreeWay”, which tests these operations (plus allocation) for Apfloat, BigDecimal, and Quadruple. In the arbitrary precision library I’m using only 32 significant digits of precision instead of 34. I do 1 set of 1 second as warm-up, and 5 iterations of 5 seconds for the measurement (see source).

Results

Below are the specs of the system I used to run the tests, and the specific software versions used. This time around I ran the benchmarks in my laptop while it was plugged in. Only the CPU and the memory should play a significant role.

# JMH version: 1.37
# VM version: JDK 21.0.7, OpenJDK 64-Bit Server VM, 21.0.7+6

CPU: Intel(R) Core(TM) i7-8550U (8) @ 4.00 GHz
GPU 1: NVIDIA GeForce GTX 1070 [Discrete]
GPU: Intel UHD Graphics 620 @ 1.15 GHz [Integrated]
Memory: 16.00 GiB
Swap: 8.00 GiB

And here are the benchmark results.

Addition

Three-way Addition results – Interactive view

Of course, Quadruple is compact and only needs to care about 128 bits, while Apfloat and BigDecimal are generic to any precision, so we can expect Quadruple to be faster. And it is.

Subtraction

Three-way Subtraction results – Interactive view

Same with subtraction.

Multiplication

Three-way Multiplication results – Interactive view

And multiplication.

Division

Three-way Division results – Interactive view

Division is also faster with the newcomer.

Allocation (from string)

Finally, the allocation test. First, we test allocation from a string representation of a floating point number.

Three-way Allocation results (from string) – Interactive view

Surprising. Let’s analyze this. We use JOL to find out the instance size of each object.

  • Quadruple has an instance size of 40 bytes (2 longs, 1 int, 1 boolean, plus header).
  • BigDecimal has an instance size of also 40 bytes (2 ints, 1 long, 2 references to BigInteger and String, plus header).
  • Apfloat has an instance size of 24 (3 references plus the object header).

It is unlikely that the issue is the instance size. It most definitely comes down to the code to convert the string into the internal representation of each type. This code seems to be much slower for Quardruple than it is for the others. Let’s see how it fares allocating from a double.

Allocation (from double)

Three-way Allocation results (from double) – Interactive view

The story is reversed. Quadruple is much faster than the others when allocating an object from a double. I never allocate from strings, so this is not that bad actually.

Analysis

There’s not much to say. Quadruple is obviously much faster in a very significant way than the others. This is, of course, to be expected if we consider that Quadruple only deals with float-128 types and does not have to care about higher precisions. It may be enough for your purposes, like it is for mine. If this is the case, it may make sense to use it.

Caveats

There are a couple of important caveats to consider if you want to use Quadruple as it is now:

  • Only the basic operations are implemented (add, sub, div, mul, sqrt). If you need anything else, you are on your own.
  • Quadruple instances are mutable. This is a bad design decision in my opinion, and would bar it from adopting further improvements that will land soon to Java like value types (project Valhalla).
  • Instantiation from String is very slow.

Website design by myself. See the privacy policy.
Content licensed under CC-BY-NC-SA 4.0 .