Open
Description
After quantization of LLaMA2-7b, I notice that total parameters of the quantized model is around 1.1B while the original dense model has around 6.7B parameters. It seems that the code also prunes LLM weights. Any idea why weights are additionally removed?
Thanks a lot!
Metadata
Assignees
Labels
No labels