What is it about?

This work stems from the urgent need of creating systems and procedures for managing and sharing cultural heritages in both supranational and multi-literate contexts. We propose an innovative workflow and tool for the automatic extraction of knowledge and cataloguing of documents written in non-Latin languages (Arabic, Persian and Azerbaijani). It leverages different OCR, text processing and information extraction techniques in order to provide both a highly accurate extracted text and rich metadata content (including automatically identified cataloguing metadata).

Featured Image

Why is it important?

DigitalMaktaba focuses on innovative solutions in the context of digital libraries, providing several techniques to support and automate many of the tasks (OCR, linguistic- resource linking, metadata extraction, and so on) related to the text sensing/knowledge extraction and cataloguing of the documents in a multi-lingual context. The project test case is the large collection of digital books made internally available by the “Giorgio La Pira” library in Palermo, which is a hub of FSCIRE foundation, dedicated to history and doctrines of Islam. The discussed techniques and their rich metadata output will be the groundwork for the complete semi-automated cataloguing system we are aiming to obtain, whose future steps include intelligent and AI-based techniques providing even greater assistance to the librarian and incremental learning with system use.

Perspectives

Please contact us if you are interested in knowing more or participating to this ongoing research!

Riccardo Martoglia
University of Modena and Reggio Emilia

Read the Original

This page is a summary of: Preserving and conserving culture, September 2021, ACM (Association for Computing Machinery),
DOI: 10.1145/3462203.3475927.
You can read the full text:

Read

Contributors

The following have contributed to this page