Building a Llama2-finetuned LLM for Odia Language Utilizing Domain Knowledge Instruction Set

Guneet S Kohli; Shantipriya Parida; Sambit Sekhar; Samirit Saha; Nipun B Nair; Parul Agarwal; Sonal Khosla; Kusumlata Patiyal; Debasish Dhal

doi:10.1145/3639856.3639890

What is it about?

The demand for building LLMs for languages other than English is growing due to the limitations of multilingual LLMs in understanding local contexts. This issue is particularly critical for low-resource languages that lack comprehensive instruction sets. In a multilingual country like India, there is a pressing need for LLMs that support Indic languages to provide generative AI and LLM-based technologies and services to its citizens. This work addresses this need by i) generating a large Odia instruction set that includes domain-specific data suitable for LLM fine-tuning, and ii) developing a Llama2-finetuned model designed for enhanced performance in the Odia domain. This initiative aims to assist researchers in building instruction sets and LLMs, especially for Indic languages.

Photo by Paul Lequay on Unsplash

Why is it important?

This work emphasizes the critical role of domain knowledge in developing Large Language Models (LLMs) that can accurately understand and respond to local contexts, cultures, and customs. By integrating specific domain knowledge, LLMs become more adept at interpreting nuanced information, making them significantly more effective for use in particular regions and languages.

Perspectives

This collaborative effort brought together a group of researchers dedicated to advancing LLMs for Indic languages. As a coauthor, I thoroughly enjoyed the entire process, from planning and data preparation to model building and evaluation. It's heartening to see the growing interest and motivation among researchers working to develop LLMs for low-resource languages.
Shantipriya Parida
Silo AI, Finland

This page is a summary of: Building a Llama2-finetuned LLM for Odia Language Utilizing Domain Knowledge Instruction Set, October 2023, ACM (Association for Computing Machinery),
DOI: 10.1145/3639856.3639890.
You can read the full text:

Read

Resources

Contributors

The following have contributed to this page

Shantipriya Parida
Silo AI, Finland

Building a Llam2 Finetuned LLM for Low-Resource Odia Language

What is it about?

Why is it important?

Perspectives

Resources

OdiaGenAI Released Llama2-Fine-tuned Model for Odia

Meet the Creator of ଓଡ଼ିଆ Llama

Advancing AI for Indic Languages with OdiaGenAI

Contributors

Discover more

Medical Research

Life Sciences

Physical Sciences

Technology and Engineering

Environmental Research

Arts and Humanities

Social Sciences

Business and Management

Building a Llam2 Finetuned LLM for Low-Resource Odia Language

What is it about?

Featured Image

Why is it important?

Perspectives

Read the Original

Resources

OdiaGenAI Released Llama2-Fine-tuned Model for Odia

Meet the Creator of ଓଡ଼ିଆ Llama

Advancing AI for Indic Languages with OdiaGenAI

Contributors

Share this page:

Discover more

Medical Research

Life Sciences

Physical Sciences

Technology and Engineering

Environmental Research

Arts and Humanities

Social Sciences

Business and Management