What is it about?

Kurdish is an Indo-European language spoken by 20 to 30 million people. Despite the considerable number of speakers, Kurdish has a limited number of processing tools and resources and therefore, is considered as a less-resourced language. In this article, we address one of the fundamental issues in Kurdish text mining. We present a rule-based approach for transliterating two mostly used orthographies in Kurdish, that is converting texts written in one script into another. The various transliteration challenges are studied and a transliteration tool, called Wergor, is presented.

Featured Image

Why is it important?

The difference of orthographies naturally results in the distinction of produced textual sources and adds to the gap between the dialects and thus scatters readers. On the other hand, text ambiguities pose various problems in processing text resources and building language technology tools. Our presented tool is open-source and may help future researchers to tackle Kurdish text mining issues efficiently.

Perspectives

The outcomes of this paper can pave the way for future researches related to Kurdish text mining and for building natural language processing applications for Kurdish processing.

Sina Ahmadi
Insight Centre for Data Analytics

Read the Original

This page is a summary of: A Rule-Based Kurdish Text Transliteration System, ACM Transactions on Asian and Low-Resource Language Information Processing, June 2019, ACM (Association for Computing Machinery),
DOI: 10.1145/3278623.
You can read the full text:

Read

Resources

Contributors

The following have contributed to this page