What is it about?

Recent ML models require a large and clean corpus of parallel data. Most of the time, they cannot even deal with rare words effectively. Due to the unavailability of a large parallel corpus, it is challenging to use ML models for translating Sanskrit. However, we have improved the translation accuracy even under zero-shot conditions using morphological patterns (such as Dhatu, Vibhakti, and compound words) and improved filtering heuristics.

Featured Image

Why is it important?

Much work needs to be done using ML to address the challenges in translating Sanskrit, one of the oldest and rich languages known to the world, with its morphological richness and limited multilingual parallel corpus.

Perspectives

Improving the translation models is, in my opinion, one of the best ways to preserve an ancient and rich language like Sanskrit, which was once known to the majority of the Indian population, but is now only spoken by a few thousand.

Piyush Jha
University of Waterloo

Read the Original

This page is a summary of: Filtering and Extended Vocabulary based Translation for Low-resource Language pair of Sanskrit-Hindi, ACM Transactions on Asian and Low-Resource Language Information Processing, January 2023, ACM (Association for Computing Machinery),
DOI: 10.1145/3580495.
You can read the full text:

Read

Contributors

The following have contributed to this page