What is it about?

Low-resource Machine Translation (MT) is characterized by the scarce availability of training data and/or standardized evaluation benchmarks. In the context of Dialectal Arabic, recent works introduced several evaluation benchmarks covering both Modern Standard Arabic (MSA) and dialects, mapping, however, mostly to a single Indo-European language-English. In this work, we introduce a multi-lingual corpus consisting of 120,600 multi-parallel sentences in English, French, German, Greek, Spanish, and MSA selected from the OpenSubtitles corpus, which were manually translated into the North Levantine Arabic. By conducting a series of training and fine-tuning experiments, we explore how this novel resource can contribute to the research on Arabic MT.

Featured Image

Why is it important?

Adds a resource for a low resource language - North Levantine (Syrian) Arabic.

Read the Original

This page is a summary of: Multi-Parallel Corpus of North Levantine Arabic, January 2023, Association for Computational Linguistics (ACL),
DOI: 10.18653/v1/2023.arabicnlp-1.34.
You can read the full text:

Read

Contributors

The following have contributed to this page