What is it about?
Bilingual dictionaries are useful resources for multiple NLP applications, but are unavailable for low resource language pairs. We show a method for inducing such dictionaries from Wikipedia data. In addition to Wikipedia data in the languages present in the pair, we use Wikipedia data from other languages to significantly boost accuracy of the dictionaries induced.
Featured Image
Why is it important?
The method enables us to bootstrap bilingual dictionaries in low resource language pairs. This can lead to development of higher quality dictionaries, and can also enable application of multiple NLP techniques which need such dictionaries as a resource.
Perspectives
This work involved a significant amount of careful empirical work, and it was a great learning experience. The importance of ensuring reproducibility of the experiment came out of the efforts, and it was gratifying to see this recognized by the community. A part of this work was presented at the CiCLing conference and won the Best Student Paper award, and also the Best Verifibility, Reproducibility, and Working Description Award.
Goutham Tholpadi
Indian Institute of Science
Read the Original
This page is a summary of: Corpus-Based Translation Induction in Indian Languages Using Auxiliary Language Corpora from Wikipedia, ACM Transactions on Asian and Low-Resource Language Information Processing, September 2017, ACM (Association for Computing Machinery),
DOI: 10.1145/3038295.
You can read the full text:
Contributors
The following have contributed to this page







