Mathematical utilities and types.
More...
|
constexpr float | EPS = 1e-6 |
|
Mathematical utilities and types.
Float (de)quantization code taken from meshoptimizer (https://github.com/zeux/meshoptimizer)
All rights reserved by the original author, see license in meshoptimizer.h
◆ float2
◆ float3
◆ float4
◆ float4x4
◆ dequantizeFP16() [1/2]
float Foundation::Math::dequantizeFP16 |
( |
uint16_t |
h | ) |
|
Reverse quantization of a half-precision (as defined by IEEE-754 fp16) floating point value Preserves Inf/NaN, flushes denormals to zero
◆ dequantizeFP16() [2/2]
float Foundation::Math::dequantizeFP16 |
( |
unsigned short |
h | ) |
|
◆ dequantizeSnorm()
float Foundation::Math::dequantizeSnorm |
( |
int32_t |
q, |
|
|
int32_t |
Nbits |
|
) |
| |
|
inline |
◆ dequantizeSnormShifted()
float Foundation::Math::dequantizeSnormShifted |
( |
uint32_t |
q, |
|
|
int32_t |
Nbits |
|
) |
| |
|
inline |
◆ dequantizeUnorm()
float Foundation::Math::dequantizeUnorm |
( |
int32_t |
q, |
|
|
int32_t |
Nbits |
|
) |
| |
|
inline |
◆ packUnitOctahedral()
vec2 Foundation::Math::packUnitOctahedral |
( |
vec3 |
v | ) |
|
Unit Vector Packing See also:
◆ quantizeFP16()
uint16_t Foundation::Math::quantizeFP16 |
( |
float |
v | ) |
|
Quantize a float into half-precision (as defined by IEEE-754 fp16) floating point value Generates +-inf for overflow, preserves NaN, flushes denormals to zero, rounds to nearest Representable magnitude range: [6e-5; 65504] Maximum relative reconstruction error: 5e-4
◆ quantizeFP32() [1/2]
float Foundation::Math::quantizeFP32 |
( |
float |
v, |
|
|
int |
N |
|
) |
| |
◆ quantizeFP32() [2/2]
float Foundation::Math::quantizeFP32 |
( |
float |
v, |
|
|
int32_t |
N |
|
) |
| |
Quantize a float into a floating point value with a limited number of significant mantissa bits, preserving the IEEE-754 fp32 binary representation Generates +-inf for overflow, preserves NaN, flushes denormals to zero, rounds to nearest Assumes N is in a valid mantissa precision range, which is 1..23
◆ quantizeSnorm()
int32_t Foundation::Math::quantizeSnorm |
( |
float |
v, |
|
|
int32_t |
N |
|
) |
| |
|
inline |
[-1, 1] range -> [-(1<< (Nbits - 1)) - 1, (1 << (Nbits - 1))] \in N e.g. Nbits = 10 -> [-511, 512] In transport you may want to add 1 << (Nbits - 1) to the quantized value to shift it to [0, 1 << NBits) range since you'd be packing complement int32 bits - truncation would result in a loss of precision To do this, use QuantizeSnormShifted and DequantizeSnormShifted
◆ quantizeSnormShifted()
uint32_t Foundation::Math::quantizeSnormShifted |
( |
float |
v, |
|
|
int32_t |
Nbits |
|
) |
| |
|
inline |
◆ quantizeUnorm()
uint32_t Foundation::Math::quantizeUnorm |
( |
float |
v, |
|
|
int32_t |
N |
|
) |
| |
|
inline |
◆ unpackUnitOctahedral()
vec3 Foundation::Math::unpackUnitOctahedral |
( |
vec2 |
v | ) |
|
◆ EPS
constexpr float Foundation::Math::EPS = 1e-6 |
|
constexpr |