Add instructions for approximate floating-point reciprocals

# Reciprocal approximations

The following instructions are of special interest, and could make it into the B format encoding space:

* FRECPE - Floating-point reciprocal estimate (estimate 1.0/x)
* FRSQRTE - Floating-point reciprocal square root estimate (estimate 1.0/sqrt(x))

The main question is how accurate the approximation should be. A common choice in other ISAs appears to be ~8 bits (the CRAY-1 produced 30 bits, though). We could use different levels of accuracy for different floating-point formats.

Here are a few different possibilities:

| Approximation | NR steps for float32 | NR steps for float16 | NR steps for float8 |
| --- | --- | --- | --- |
| 8 | 2 | 1 | 0 |
| 11 | 2 | 0 | 0 |
| 12-13 | 1 | 0 | 0 |

For the instructions to be valuable, they should offer a reasonable improvement over the classic hack:

```c++
float approximate_rsqrt(float x) {
    auto i = std::bit_cast<uint32_t>(x);
    i = 0x5f3759df - (i >> 1);
    return std::bit_cast<float>(i);
}
```
...which is very cheap on MRISC32 (only 4 integer instructions without any latencies):

```
        ldi     r2, #0x5f3759df  ; ldi + or
        lsr     r1, r1, #1
        sub     r1, r2, r1
```

# Reciprocal improvements

The approximation can be improved using Newton-Raphson iterations. These iterations can be implemented using regular floating-point arithmetic operations, but it's also possible to provide dedicated instructions that does the majority of the work of an iteration in a single instruction.

The main advantages of such instructions are:

* Reduce the latency of consecutive, dependent NR operations.
* Support 1/0 = Inf (during NR `Inf*0` must result in `Inf` rather than `NaN`, which would have been the case without a dedicated instruction).
* Possibly improved accuracy as the instruction performs a fused multiply-add.

These instructions would have to go into the format A encoding space.

## Reciprocal

One NR step for improving `y = 1 / x` is: `y = y * (2 - x * y)`.

A suitable instruction (based on FMA) should implement `2 - a * b`.

## Reciprocal square root

One NR step for improving `y = 1 / sqrt(x)` is: `y = y * (1.5 - (0.5 * x * y * y))`.

A suitable instruction (based on FMA) should implement `(3 - a * b) / 2`.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add instructions for approximate floating-point reciprocals #140

Reciprocal approximations

Reciprocal improvements

Reciprocal

Reciprocal square root

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development