Corpus-Based Translation Induction in Indian Languages Using Auxiliary Language Corpora from Wikipedia

Goutham Tholpadi; Chiranjib Bhattacharyya; Shirish Shevade

doi:10.1145/3038295

What is it about?

Bilingual dictionaries are useful resources for multiple NLP applications, but are unavailable for low resource language pairs. We show a method for inducing such dictionaries from Wikipedia data. In addition to Wikipedia data in the languages present in the pair, we use Wikipedia data from other languages to significantly boost accuracy of the dictionaries induced.

Why is it important?

The method enables us to bootstrap bilingual dictionaries in low resource language pairs. This can lead to development of higher quality dictionaries, and can also enable application of multiple NLP techniques which need such dictionaries as a resource.

Perspectives

This work involved a significant amount of careful empirical work, and it was a great learning experience. The importance of ensuring reproducibility of the experiment came out of the efforts, and it was gratifying to see this recognized by the community. A part of this work was presented at the CiCLing conference and won the Best Student Paper award, and also the Best Verifibility, Reproducibility, and Working Description Award.
Goutham Tholpadi
Indian Institute of Science

This page is a summary of: Corpus-Based Translation Induction in Indian Languages Using Auxiliary Language Corpora from Wikipedia, ACM Transactions on Asian and Low-Resource Language Information Processing, September 2017, ACM (Association for Computing Machinery),
DOI: 10.1145/3038295.
You can read the full text:

Read

Contributors

The following have contributed to this page

Goutham Tholpadi
Indian Institute of Science

Learning bilingual dictionaries between Indian languages using Wikipedia data

What is it about?

Why is it important?

Perspectives

Contributors

Discover more

Medical Research

Life Sciences

Physical Sciences

Technology and Engineering

Environmental Research

Arts and Humanities

Social Sciences

Business and Management

Learning bilingual dictionaries between Indian languages using Wikipedia data

What is it about?

Featured Image

Why is it important?

Perspectives

Read the Original

Contributors

Share this page:

Discover more

Medical Research

Life Sciences

Physical Sciences

Technology and Engineering

Environmental Research

Arts and Humanities

Social Sciences

Business and Management