Towards Development of New Language Resource for Urdu: The Large Vocabulary Word Embeddings

Fatima Tuz Zuhra; Khalid Saleem

doi:10.1145/3748308

What is it about?

This article is about the development of a new language resource for Urdu language, which is a resource-poor language. The reported language resource is in the form of word embeddings, trained on a larger collection of Urdu text, as compared to the amount of text used by other Urdu language researchers. The reported word embeddings cover a vocabulary of Urdu, with a size almost doubled as compared to the state-of-the-art in the language.

Photo by GuerrillaBuzz on Unsplash

Why is it important?

Urdu language is a resource-poor language. Word embeddings are the resource needed by researchers who are using machine learning, deep learning models or neural networks to accomplish various tasks under the umbrella of Urdu natural language processing (NLP). The word embeddings can be used to perform NLP tasks such as parsing, machine translation, sentiment analysis etc as well as the development of large language models (LLMs) and generative AI for Urdu.

Perspectives

The research work reports an important resource required for the natural language processing of the Urdu language using state-of-the-art machine learning and neural network models. The resource covers a vocabulary whose size is almost double of the accumulated sizes of word embeddings developed by other researchers of the language.
Fatima Tuz Zuhra
Quaid-i-Azam University

This page is a summary of: Towards Development of New Language Resource for Urdu: The Large Vocabulary Word Embeddings, ACM Transactions on Asian and Low-Resource Language Information Processing, August 2025, ACM (Association for Computing Machinery),
DOI: 10.1145/3748308.
You can read the full text:

Read

Contributors

The following have contributed to this page

Fatima Tuz Zuhra
Quaid-i-Azam University

Towards Development of New Language Resource for Urdu: The Large Vocabulary Word Embeddings

What is it about?

Why is it important?

Perspectives

Contributors

Discover more

Medical Research

Life Sciences

Physical Sciences

Technology and Engineering

Environmental Research

Arts and Humanities

Social Sciences

Business and Management

Towards Development of New Language Resource for Urdu: The Large Vocabulary Word Embeddings

What is it about?

Featured Image

Why is it important?

Perspectives

Read the Original

Contributors

Share this page:

Discover more

Medical Research

Life Sciences

Physical Sciences

Technology and Engineering

Environmental Research

Arts and Humanities

Social Sciences

Business and Management