Fundamentals
- single precision floating point numbers have 1 sign bit, 8 exponent bits, 23 mantissa bits
- double precision have 1, 11, 52, respectively
- when truncating floating point numbers, round up if the truncated bit is 1, down otherwise
- relative error is absolute error divided by exact value
Handling Loss of Significance