CNN-BiLSTM Model for Arabic Dialect Identification

Malek Hedhli; Ferihane Kboubi

doi:10.1007/978-3-031-41774-0_17

What is it about?

Dialectal Arabic presents a unique challenge for NLP systems due to its wide variation across 22 Arabic-speaking countries and the lack of balanced resources. In our latest research, we address this gap by proposing a novel approach for Arabic Dialect Identification (ADI).

Why is it important?

Key contributions of our work: - Construction of a balanced dataset by merging and filtering 7 existing unbalanced datasets - Introduction of a new ADI model combining CNN and BiLSTM architectures with AraVec embeddings - Comparative evaluation against several machine learning and deep learning baselines - Achieved a ~2% improvement in accuracy over the best-performing baseline models This work sets a new benchmark for Arabic Dialect Identification, opening the door for improved sentiment analysis, machine translation, and hate speech detection. Our findings show promising potential for improving downstream tasks such as sentiment analysis, machine translation, and hate speech detection. This research marks a significant step toward more accurate and interpretable Arabic NLP systems.

Perspectives

Perspectives and Future Work - Extension to More Dialects: Expand the current 5-dialect setup to include all 22 Arabic-speaking countries, or even minority dialects and cross-border varieties. - Multimodal Dialect Identification: Incorporate audio or phonetic features alongside text for dialect detection in spoken language applications (e.g., voice assistants, transcription tools).
Ferihane Kboubi

This page is a summary of: CNN-BiLSTM Model for Arabic Dialect Identification, January 2023, Springer Science + Business Media,
DOI: 10.1007/978-3-031-41774-0_17.
You can read the full text:

Read

Contributors

The following have contributed to this page

Ferihane Kboubi

CNN-BiLSTM model for Arabic Dialect Identification

What is it about?

Why is it important?

Perspectives

Contributors

Discover more

Medical Research

Life Sciences

Physical Sciences

Technology and Engineering

Environmental Research

Arts and Humanities

Social Sciences

Business and Management

CNN-BiLSTM model for Arabic Dialect Identification

What is it about?

Featured Image

Why is it important?

Perspectives

Read the Original

Contributors

Share this page:

Discover more

Medical Research

Life Sciences

Physical Sciences

Technology and Engineering

Environmental Research

Arts and Humanities

Social Sciences

Business and Management