What is it about?

The paper gives the complete design details involved in the development of Lipi Gnani, a printed text recognition software that can work on any document of Kannada, Sanskrit, Tulu and Konkani, printed in Kannada script. This paper is a benchmark for Kannada document recognition and has been recommended by a reviewer for the best paper award. This OCR has now been licensed to RaGaVeRa Indic Technologies by the Indian Institute of Science. Another startup, BhaShiNi Digitization Services, is using Lipi Gnani to digitize hundreds of Kannada books by famous authors into e-books that can be read on Amazon Kindle, etc. This startup has been funded for this work by the Karnataka Startup Cell. And they are successfully running the OCR on Raspberry Pi processor !

Featured Image

Why is it important?

Printed text recognition software can be used to convert physical libraries with huge collection of printed text into digital libraries accessible from anywhere in the world. Lipi Gnani is the best of the OCRs for all Indian languages. It is 3 times as fast and marginally better than Google's Tesseract OCR version 4.0.0

Perspectives

Today, everyone talks about deep neural networks and end-to-end systems. This is a data and computation hungry approach, and hence, it not easily scalable to new problems. However, our approach is based on support vector machines and we have made use of a lot of domain-knowledge to create a printed text recognition system for Kannada, whose computational requirement is less for the user and very less for the developer.

Ramakrishnan Angarai Ganesan
Indian Institute of Science

Read the Original

This page is a summary of: Lipi Gnani, ACM Transactions on Asian and Low-Resource Language Information Processing, July 2020, ACM (Association for Computing Machinery),
DOI: 10.1145/3387632.
You can read the full text:

Read

Resources

Contributors

The following have contributed to this page