What is it about?

Recognizing the names of identifiable entities such as buildings, medicines, and products from unstructured text is crucial for many applications and services. We have developed a scalable framework TAFSIL to recognize wide variety of entities across several languages spoken by more than 1.5 billions people. This paper demonstrates the efficacy of the proposed framework and the high quality of the generated dataset.

Featured Image

Why is it important?

Data hungry AI systems often struggle in low-resource languages and with recognizing new and unseen entities. Our TAFSIL framework provides a two-pronged solution by enabling the creation of fine-grained entity recognition datasets in different taxonomies for six languages spoken across various South and South-east Asian countries.

Perspectives

I hope this article and the resources will pave an important path towards the advancement of AI solutions, especially for low-resource languages. It is a great inspiration that through the research work, billions of people may benefit.

Prachuryya Kaushik
Indian Institute of Technology Guwahati

Read the Original

This page is a summary of: TAFSIL: Taxonomy Adaptable Fine-grained Entity Recognition through Distant Supervision for Indian Languages, July 2025, ACM (Association for Computing Machinery),
DOI: 10.1145/3726302.3730341.
You can read the full text:

Read

Resources

Contributors

The following have contributed to this page