What is it about?
Factored-based machine translation deals with various important linguistics information of a word during the translation process. Morphologically rich languages like Hindi provide multiple word forms from the root or dictionary word by differing in morphological information such as part of speech (POS), affixes, number, gender, etc. So, linguistic factors can provide useful information while translating morphologically rich languages. Different factors contribute in a different manner. In this chapter, we study the significance of different linguistic factors in a phrase-based statistical machine translation (SMT) framework employed for Hindi to English translation. We performed experiments over HindEnCorp and ILCI dataset for Hindi–English. We find that POS+lemma+gender achieves the highest BLEU score (16.46) for HindEnCorp dataset, and POS+lemma+number achieves the highest BLEU score (19.11) for ILCI dataset.
Featured Image
Why is it important?
This chapter highlights the importance of factored-based machine translation, particularly in morphologically rich languages like Hindi, where linguistic factors such as part of speech, lemma, gender, and number significantly impact translation quality. Through experimentation, it demonstrates how incorporating these factors into a phrase-based statistical machine translation framework improves translation accuracy, as evidenced by the notable BLEU scores achieved for HindEnCorp and ILCI datasets.
Perspectives
This research underscores the pivotal role of linguistic factors in enhancing translation quality, particularly in morphologically complex languages like Hindi, where incorporating factors such as part of speech, lemma, gender, and number leads to substantial improvements in translation accuracy, as demonstrated through BLEU score comparisons on HindEnCorp and ILCI datasets.
Dr. Debajyoty Banik
Read the Original
This page is a summary of: The Important Influencing Factors in Machine Translation, November 2022, Springer Science + Business Media,
DOI: 10.1007/978-3-031-15175-0_10.
You can read the full text:
Contributors
The following have contributed to this page







