TAFSIL: Taxonomy Adaptable Fine-grained Entity Recognition through Distant Supervision for Indian Languages

Prachuryya Kaushik; Shivansh Mishra; Ashish Anand

doi:10.1145/3726302.3730341

What is it about?

Recognizing the names of identifiable entities such as buildings, medicines, and products from unstructured text is crucial for many applications and services. We have developed a scalable framework TAFSIL to recognize wide variety of entities across several languages spoken by more than 1.5 billions people. This paper demonstrates the efficacy of the proposed framework and the high quality of the generated dataset.

Photo by zhendong wang on Unsplash

Why is it important?

Data hungry AI systems often struggle in low-resource languages and with recognizing new and unseen entities. Our TAFSIL framework provides a two-pronged solution by enabling the creation of fine-grained entity recognition datasets in different taxonomies for six languages spoken across various South and South-east Asian countries.

Perspectives

I hope this article and the resources will pave an important path towards the advancement of AI solutions, especially for low-resource languages. It is a great inspiration that through the research work, billions of people may benefit.
Prachuryya Kaushik
Indian Institute of Technology Guwahati

This page is a summary of: TAFSIL: Taxonomy Adaptable Fine-grained Entity Recognition through Distant Supervision for Indian Languages, July 2025, ACM (Association for Computing Machinery),
DOI: 10.1145/3726302.3730341.
You can read the full text:

Read

Resources

Data
TAFSIL Dataset
The fine-grained entity recognition dataset TAFSIL is available for Hindi, Marathi, Sanskrit, Tamil, Telugu and Urdu languages are available in FIGER, OntoNotes, HAnDS, and MultiCoNER2 taxonomies.

Contributors

The following have contributed to this page

Prachuryya Kaushik
Indian Institute of Technology Guwahati

How many names can you recognize?

What is it about?

Why is it important?

Perspectives

Resources

TAFSIL Dataset

Contributors

Discover more

Medical Research

Life Sciences

Physical Sciences

Technology and Engineering

Environmental Research

Arts and Humanities

Social Sciences

Business and Management

How many names can you recognize?

What is it about?

Featured Image

Why is it important?

Perspectives

Read the Original

Resources

TAFSIL Dataset

Contributors

Share this page:

Discover more

Medical Research

Life Sciences

Physical Sciences

Technology and Engineering

Environmental Research

Arts and Humanities

Social Sciences

Business and Management