https://www.microsoft.com/en-us/research/publication/1-bit-s...
int4 (fixed point) has already been popular for inference https://developer.nvidia.com/blog/int4-for-ai-inference/ and int3 has seem some use for LLaMA-at-home
https://www.microsoft.com/en-us/research/publication/1-bit-s...
int4 (fixed point) has already been popular for inference https://developer.nvidia.com/blog/int4-for-ai-inference/ and int3 has seem some use for LLaMA-at-home