What is it about?

Gujarati is an Indic language. In addition, Gujarati is the official language of the state Gujarat in India and is spoken by over 50 million people. Gujarati is a morphologically rich language, as a root word in Gujarati may have 6 to 15 inflectional words. We have shown that the complexity of processing Gujarati text can be decreased if we reduce inflectional words to their root words. Besides inflectional forms, single-letter words also cause increases in the dictionary size of a corpus.

Featured Image

Why is it important?

We have performed an exhaustive set of experiments to show the influence of inflectional forms reduction and single-letter words removal on the time complexity of topic modeling and interpretability of the topics. Our experimental results show that topics become more meaningful and interpretable. Furthermore, it took remarkably less time for modeling topics. We also discovered that other topic goodness criteria, such as word length and topic size, got improved.

Perspectives

I hope this article helps the NLP research community for the Gujarati language specifically. I have put wholehearted efforts during the entire journey of preparing this article, such as hypothesis design, experimental design, testing the hypothesis by different angles, explaining the results, and other relevant tasks. Though the whole research task was very challenging, it was exciting for me at the same time. I believe that this research work will help many, as it is always challenging for carrying out research work on Gujarati text, as it is one of the low-resource Indic languages.

Dr. Uttam Chauhan
Vishwakarma Government Engineering College

Read the Original

This page is a summary of: Improving Semantic Coherence of Gujarati Text Topic Model Using Inflectional Forms Reduction and Single-letter Words Removal, ACM Transactions on Asian and Low-Resource Language Information Processing, January 2021, ACM (Association for Computing Machinery),
DOI: 10.1145/3447760.
You can read the full text:

Read

Contributors

The following have contributed to this page