What is it about?
This study looks at how computers can automatically find the most important keywords in Arabic scientific articles. Keywords are short terms that describe what a research paper is about, and they help other researchers quickly discover relevant work. While keyword extraction has improved a lot in English and other languages, Arabic presents unique challenges because of its complex grammar and the limited availability of high-quality data. To address this, we collected a large dataset of nearly 39,000 records from Arabic journals and used it to train advanced AI models called BERT-based models. These models learn to recognize which words in a text should be considered keywords. Our experiments showed that one particular model, bertBaseQarib, performed especially well, identifying keywords with high accuracy. This research demonstrates how artificial intelligence can make Arabic research more visible and accessible worldwide.
Featured Image
Photo by Joshua Hoehne on Unsplash
Why is it important?
This work is important because it fills a major gap in Arabic natural language processing, an area where resources and tools are still limited compared to other languages. By creating a large dataset and testing several BERT-based models, our study provides a strong foundation for future research on Arabic keyword extraction. This is timely as more Arabic scientific work is being published, and better keyword extraction will improve how easily this research can be found, indexed, and used by others. Ultimately, our findings can help increase the visibility of Arabic research in global databases, support more effective academic search engines, and contribute to the growth of AI applications tailored for the Arabic language.
Read the Original
This page is a summary of: BERT-based Models for Keyword Extraction from Arabic Scientific Articles, ACM Transactions on Asian and Low-Resource Language Information Processing, September 2025, ACM (Association for Computing Machinery),
DOI: 10.1145/3761805.
You can read the full text:
Contributors
The following have contributed to this page







