A Comparative Analysis of Stemming Strategies in Arabic Broken Plural Classification Using Logistic Regression.

Authors

  • Yousuf A Maneetah University of Benghazi, Benghazi, Libya Author

DOI:

https://doi.org/10.64516/kmkhyp28

Keywords:

Arabic Broken Plurals, Machine Learning, Logistic Regression, Stemming, Text Classification, Confusion Matrix

Abstract

The paper compares the efficacy of various machine learning models at identifying broken plurals in Arabic text. We compare Logistic Regression with and without stemming strategies, utilizing Tashaphyne and ISRI stemmers. The evaluation is based on parameters such as accuracy, precision, recall, and F1-score. Our data show that Logistic Regression without stemming performed the best, implying that for this particular dataset, the raw word characteristics were adequate for the machine to develop good categorization patterns. It indicates that stemming may have contributed noise or eliminated critical information, reducing the algorithms' capacity to learn accurate classifications. A confusion matrix was used to evaluate the accuracy of Logistic Regression. The results validated the superior performance of the Logistic Regression classifier in Arabic broken plural identification tasks.

Downloads

Download data is not yet available.

Author Biography

  • Yousuf A Maneetah , University of Benghazi, Benghazi, Libya

    Computer Science, University of Benghazi, Benghazi, Libya

References

Alhaj, Y. A., Xiang, J., Zhao, D., Al-Qaness, M. A., Abd Elaziz, M., & Dahou, A. (2019). A study of the effects of stemming strategies on Arabic document classification. IEEE Access, 7, 32664–32671. https://doi.org/10.1109/ACCESS.2019.2902932

Al-Khulaidi, A. A., & Yaseen, S. M. (2023). Comparative analysis and evaluation of stemming and preprocessing techniques for Arabic text. Sana'a University Journal of Applied Sciences and Technology, 1(4).

Alkuhlani, S., & Habash, N. (2012, April). Identifying broken plurals, irregular gender, and rationality in Arabic text. In Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics (pp. 675–685).

Alshalabi, H., Tiun, S., Omar, N., Anaam, E. A., & Saif, Y. (2022). BPR algorithm: New broken plural rules for an Arabic stemmer. Egyptian Informatics Journal, 23(3), 363–371. https://doi.org/10.1016/j.eij.2021.09.002

Atwan, J., Wedyan, M., Bsoul, Q., Hammadeen, A., & Alturki, R. (2021). The use of stemming in the Arabic text and its impact on the accuracy of classification. Scientific Programming, 2021, 1367210. https://doi.org/10.1155/2021/1367210

Goweder, A., Poesio, M., De Roeck, A., & Reynolds, J. (2004, July). Identifying broken plurals in unvowelised Arabic text. In Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing (pp. 246–253).

Khelil, H. F., Ibrahim, M. F., Hussein, H. A., & Naser, R. K. (2024a). Evaluation of different stemming techniques on Arabic customer reviews. Journal of Techniques, 6(2), 1–8.

Khelil, H. F., Ibrahim, M. F., Hussein, H. A., & Naser, R. K. (2024b). Evaluation of different stemming techniques on Arabic customer reviews. Journal of Techniques, 6(2), 1–8.

Syarief, M. G., Kurahman, O. T., Huda, A. F., & Darmalaksana, W. (2019, July). Improving Arabic stemmer: ISRI stemmer. In 2019 IEEE 5th International Conference on Wireless and Telematics (ICWT) (pp. 1–4). IEEE. https://doi.org/10.1109/ICWT.2019.8885527

Wahbeh, A., Al-Kabi, M., Al-Radaideh, Q., Al-Shawakfa, E., & Alsmadi, I. (2011). The effect of stemming on Arabic text classification: An empirical study. International Journal of Information Retrieval Research, 1(3), 54–70. https://doi.org/10.4018/jirr.2011070104

Zerrouki, T. (2024). Tashaphyne: A Python package for Arabic light stemming. Journal of Open Source Software, 9(93), 6063. https://doi.org/10.21105/joss.06063

Downloads

Published

30-12-2025

Issue

Section

Articles

How to Cite

[1]
Y. . Maneetah, “A Comparative Analysis of Stemming Strategies in Arabic Broken Plural Classification Using Logistic Regression”., TUJES, vol. 6, no. 2, pp. 166–176, Dec. 2025, doi: 10.64516/kmkhyp28.