EFFECTS OF QUANTIZATION ON LARGE LANGUAGE MODELS’ PERFORMANCE ON SENTIMENT ANALYSIS TASKS

Details: Hits: 346

Volume 8, Article e2026.01, 2026, Pages 1-19

Aydın Gasimov

Azerbaijan State Oil and Industry University, Baku, Azerbaijan, This email address is being protected from spambots. You need JavaScript enabled to view it.

Abstract

This paper investigates the effects of weight-only quantization on large language models (LLMs) for sentiment classification. We evaluate three representative models—Gemma 3 4B, Microsoft Phi-4 Mini, and Liquid LFM2 1.2B—across multiple quantization levels (FP16, Q8, Q6, Q5, Q4, Q4-QAT, Q3) using the IMDB, SST-2, and Twitter Airline Sentiment datasets. Our results show that sentiment classification is unusually resilient to quantization: accuracy differences relative to Q8 are typically within a few percentage points, with significant degradation only at 3-bit precision. Notably, Gemma’s QAT-trained Q4 variant surpasses its higher-precision baselines, underscoring the promise of training-time adaptation. However, we also observe shifts in prediction behavior, including increased class polarization and diminished recognition of minority classes under lower precisions. From a systems

perspective, quantization primarily yields storage and memory reductions on current GPUs lacking sub-8-bit execution, but delivers real runtime gains on processors with native INT4/INT2 or FP4 support, such as Qualcomm’s Snapdragon 8 Elite Gen 5, NVIDIA Blackwell, and AMD/Intel NPUs. These findings highlight that while compact LLMs at 4-bit precision already offer an attractive efficiency–accuracy trade-off for on-device deployment, the full benefits of aggressive quantization will only be realized as native low-bit hardware becomes pervasive.

Keywords:

Sentiment Analysis, Large Language Models, Quantization, Zero-shot learning, Compact LLMs, Edge AI

DOI: doi.org/10.32010/26166127.2026.01

Reference

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. (2017). Attention is all you need. Advances in Neural Information Processing Systems, 30.

Kaplan, J., McCandlish, S., Henighan, T., Brown, T. B., Chess, B., Child, R., Gray, S., Radford, A., Wu, J., & Amodei, D. (2020). Scaling laws for neural language models. arXiv preprint arXiv:2001.08361.

Han, S., Mao, H., & Dally, W. J. (2015). Deep compression: Compressing deep neural networks with pruning, trained quantization and Huffman coding. arXiv preprint arXiv:1510.00149.

Team, G., Kamath, A., Ferret, J., Pathak, S., Vieillard, N., Merhej, R., Perrin, S., Matejovicova, T., Ramé, A., Rivière, M., et al. (2025). Gemma 3 technical report. arXiv preprint arXiv:2503.19786.

Abdin, M., Aneja, J., Behl, H., Bubeck, S., Eldan, R., Gunasekar, S., Harrison, M., Hewett, R. J., Javaheripi, M., Kauffmann, P., et al. (2024). Phi-4 technical report. arXiv preprint arXiv:2412.08905.

Liquid AI. (2025). Introducing LFM2: The fastest on-device foundation models on the market. https://www.liquid.ai/blog/liquid-foundation-models-v2-our-second-series-of-generative-ai-models

Zafrir, O., Boudoukh, G., Izsak, P., & Wasserblat, M. (2019). Q8BERT: Quantized 8-bit BERT. IEEE.

Shen, S., Dong, Z., Ye, J., Ma, L., Yao, Z., Gholami, A., Mahoney, M. W., & Keutzer, K. (2020). Q-BERT: Hessian based ultra low precision quantization of BERT. AAAI.

Frantar, E., Ashkboos, S., Hoefler, T., & Alistarh, D. (2022). GPTQ: Accurate post-training quantization for generative pre-trained transformers. arXiv.

Xiao, G., Lin, J., Seznec, M., Wu, H., Demouth, J., & Han, S. (2023). SmoothQuant. ICML.

Lin, J., Tang, J., Tang, H., Yang, S., Chen, W.-M., Wang, W.-C., Xiao, G., Dang, X., Gan, C., & Han, S. (2024). AWQ. Proceedings of Machine Learning and Systems.

Dettmers, T., Svirschevski, R., Egiazarian, V., Kuznedelev, D., Frantar, E., Ashkboos, S., Borzunov, A., Hoefler, T., & Alistarh, D. (2023). SpQR. arXiv.

Kim, S., Hooper, C., Gholami, A., Dong, Z., Li, X., Shen, S., Mahoney, M. W., & Keutzer, K. (2023). SqueezeLLM. arXiv.

Zhu, X., Li, J., Liu, Y., Ma, C., & Wang, W. (2024). A survey on model compression for large language models. TACL.

Jin, R., Du, J., Huang, W., Liu, W., Luan, J., Wang, B., & Xiong, D. (2024). Evaluation of quantization strategies. ACL Findings.

Li, S., Ning, X., Wang, L., Liu, T., Shi, X., Yan, S., Dai, G., Yang, H., & Wang, Y. (2024). Evaluating quantized LLMs. arXiv.

Zhang, T., Yi, J., Xu, Z., & Shrivastava, A. (2024). KV cache quantization. NeurIPS.

Hooper, C., Kim, S., Mohammadzadeh, H., Mahoney, M. W., Shao, Y. S., Keutzer, K., & Gholami, A. (2024). KVQuant. NeurIPS.

Dettmers, T., Lewis, M., Belkada, Y., & Zettlemoyer, L. (2022). LLM.int8(). NeurIPS.

Dettmers, T., Pagnoni, A., Holtzman, A., & Zettlemoyer, L. (2023). QLoRA. NeurIPS.

Hansen, C., et al. (2023). AutoAWQ. https://github.com/casper-hansen/AutoAWQ

Matthews, B. W. (1975). Comparison of predicted and observed structure. Biochimica et Biophysica Acta.

Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., & Bowman, S. R. (2018). GLUE benchmark. arXiv.

Maas, A., Daly, R. E., Pham, P. T., Huang, D., Ng, A. Y., & Potts, C. (2011). Learning word vectors for sentiment analysis. ACL.

Socher, R., Perelygin, A., Wu, J., Chuang, J., Manning, C. D., Ng, A. Y., & Potts, C. (2013). Recursive deep models. EMNLP.Crowdflower. (2015).

Twitter US airline sentiment dataset. https://www.kaggle.com/datasets/crowdflower/twitter-airline-sentiment

NVIDIA. (2024). NVIDIA Blackwell architecture in-depth. https://www.nvidia.com/en-us/data-center/blackwell-architecture/

NVIDIA. (2020). NVIDIA A100 tensor core GPU architecture. https://www.nvidia.com/en-us/data-center/a100/

NVIDIA. (2018). NVIDIA Turing architecture whitepaper. https://www.nvidia.com/en-us/data-center/turing-architecture/

Qualcomm. (2024). Snapdragon 8 Elite Gen 5 mobile platform product brief. https://www.qualcomm.com/products/mobile/snapdragon-8-elite-gen-5

Intel. (2024). Intel NPU acceleration library v1.2 release notes. https://github.com/intel/neural-compressor/releases

AMD. (2024). Ryzen AI 300 series with XDNA 2: Developer guide. https://www.amd.com/en/products/ryzen-ai

Arm. (2023). Arm Ethos-U85 NPU product brief. https://www.arm.com/products/silicon-ip-npu/ethos-u85

Nav view search

Navigation

Search

EFFECTS OF QUANTIZATION ON LARGE LANGUAGE MODELS’ PERFORMANCE ON SENTIMENT ANALYSIS TASKS

Abstract

Reference