A Rule-Based Kurdish Text Transliteration System

Sina Ahmadi

doi:10.1145/3278623

What is it about?

Kurdish is an Indo-European language spoken by 20 to 30 million people. Despite the considerable number of speakers, Kurdish has a limited number of processing tools and resources and therefore, is considered as a less-resourced language. In this article, we address one of the fundamental issues in Kurdish text mining. We present a rule-based approach for transliterating two mostly used orthographies in Kurdish, that is converting texts written in one script into another. The various transliteration challenges are studied and a transliteration tool, called Wergor, is presented.

Why is it important?

The difference of orthographies naturally results in the distinction of produced textual sources and adds to the gap between the dialects and thus scatters readers. On the other hand, text ambiguities pose various problems in processing text resources and building language technology tools. Our presented tool is open-source and may help future researchers to tackle Kurdish text mining issues efficiently.

Perspectives

The outcomes of this paper can pave the way for future researches related to Kurdish text mining and for building natural language processing applications for Kurdish processing.
Sina Ahmadi
Insight Centre for Data Analytics

This page is a summary of: A Rule-Based Kurdish Text Transliteration System, ACM Transactions on Asian and Low-Resource Language Information Processing, June 2019, ACM (Association for Computing Machinery),
DOI: 10.1145/3278623.
You can read the full text:

Read

Resources

Project
Wergor source code + datasets
You can find the source code of Wergor and transliteration datasets and corpora for your own developments.

Contributors

The following have contributed to this page

Sina Ahmadi
Insight Centre for Data Analytics

A Rule-based Kurdish Text Transliteration System

What is it about?

Why is it important?

Perspectives

Resources

Wergor source code + datasets

Contributors

Discover more

Medical Research

Life Sciences

Physical Sciences

Technology and Engineering

Environmental Research

Arts and Humanities

Social Sciences

Business and Management

A Rule-based Kurdish Text Transliteration System

What is it about?

Featured Image

Why is it important?

Perspectives

Read the Original

Resources

Wergor source code + datasets

Contributors

Share this page:

Discover more

Medical Research

Life Sciences

Physical Sciences

Technology and Engineering

Environmental Research

Arts and Humanities

Social Sciences

Business and Management