Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Subnormal blocks are not encoded correctly #119

Closed
lindstro opened this issue Jan 29, 2021 · 1 comment
Closed

Subnormal blocks are not encoded correctly #119

lindstro opened this issue Jan 29, 2021 · 1 comment
Labels

Comments

@lindstro
Copy link
Member

Blocks whose largest nonzero value is subnormal can cause floating-point overflow during conversion to zfp's block-floating-point format. Although this conversion corrupts the mantissas, this is an otherwise benign bug as such blocks are still reconstructed as a collection of "random" subnormals, albeit with complete loss of precision.

Two potential solutions have been identified:

  • Perform the float-to-int normalization in two steps via two separate multipliers. This will, however, incur a performance penalty for all blocks.
  • Cap the smallest supported block exponent, which effectively flushes subnormals to zero (a strategy already employed by many processors). Given that the tolerances used with zfp often vastly exceed FLT_MIN, this should have a negligible effect in practice. Whereas this approach affects the compressed representation of all-subnormal blocks, current versions of zfp would still correctly decompress such blocks.
@lindstro lindstro added the bug label Jan 29, 2021
@lindstro
Copy link
Member Author

lindstro commented Aug 3, 2021

Overflow is prevented when the magnitude, x, of the largest value in a block is at least 2^-127 = FLT_MIN / 2 for floats and at least 2^-1023 = DBL_MIN / 2 for doubles, i.e., when x is at least half of the smallest positive normal number. Rather than capping the exponent and encoding zero-valued coefficients, it is more efficient to simply treat such blocks as all-zeros, which cost only one bit to encode. Such behavior is analogous to Intel's DAZ (denormals-are-zero) floating-point flag, which treats all subnormal inputs as zero. While we could in theory support values as small as {FLT,DBL}_MIN / 2 without causing overflow, for compatibility with DAZ it seems preferable to require that the largest value in a block have magnitude at least {FLT,DBL}_MIN. Note that some subnormals could still be reconstructed when the largest value in a block is normal.

This proposed behavior is invoked when the compile-time option ZFP_WITH_DAZ is enabled.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant