What is it about?

We present a robust and accurate diacritization method of highly cited texts by automatically "borrowing" diacritization from similar contexts. This method of diacritization has been tested on diacritizing one book: "Riyad As-Salheen", for the purpose of morphological annotation of the Sunnah Arabic Corpus. The original source of Riyad is about 48.66% diacritized, and after borrowing diacritization, the percentage jumps to 76.41% with low diacritic error rate (0.004), compared to 61.73% (DER=0.214) using MADAMIRA toolkit, and 67.68% (DER=0.006) using Farasa toolkit. More importantly, this method has reduced the word ambiguity from 4.83 diacritized form/word to 1.91.

Featured Image

Why is it important?

Arabic words have a high level of ambiguity, one of the reason is the underspecified phonological information. This work help restoring them by finding similar contexts.

Read the Original

This page is a summary of: Diacritization of a Highly Cited Text: A Classical Arabic Book as a Case, March 2018, Institute of Electrical & Electronics Engineers (IEEE),
DOI: 10.1109/asar.2018.8480176.
You can read the full text:

Read

Resources

Contributors

The following have contributed to this page