Knowledge graphs generation from cultural heritage texts: combining LLMs and ontological engineering for scholarly debates

Andrea Schimmenti; Valentina Pasqual; Fabio Vitali; Marieke van Erp

doi:10.1108/jd-07-2025-0203

What is it about?

Cultural heritage institutions like museums, libraries, and archives have vast collections of texts containing rich scholarly knowledge about historical documents, artifacts, and debates about their authenticity. However, this knowledge is trapped in unstructured text that cannot be easily searched or analyzed by computers. This creates a significant barrier for researchers and the public trying to access and understand complex scholarly discussions. Our research introduces ATR4CH ("Adaptive Text-to-RDF for Cultural Heritage") , a systematic five-step methodology that uses artificial intelligence (specifically Large Language Models like Llama) to automatically extract and structure knowledge from cultural heritage texts. We tested this approach on Wikipedia articles about disputed historical itemsв documents and artifacts whose authenticity has been debated by scholars over time. The method works by first analyzing the text to understand how scholarly knowledge is presented, then developing an annotation system that captures the essential elements of authenticity debates (who said what, what evidence they used, what conclusions they reached), and finally creating an automated pipeline that can process new texts and generate structured knowledge graphs. Our results show that the system can accurately extract different types of information: it performs excellently at identifying basic facts about historical items (96-99% accuracy), reasonably well at recognizing scholarly opinions and evidence (70-97% accuracy), and adequately at capturing the overall structure of scholarly debates (62% accuracy). Importantly, smaller AI models performed almost as well as larger, more expensive ones, making the approach accessible to institutions with limited budgets. This work enables cultural heritage institutions to automatically convert their textual knowledge into searchable, computer-readable formats, making scholarly debates and evidence more accessible to researchers and the public while preserving the complexity and nuance of academic discourse.

This page is a summary of: Knowledge graphs generation from cultural heritage texts: combining LLMs and ontological engineering for scholarly debates, Journal of Documentation, March 2026, Emerald,
DOI: 10.1108/jd-07-2025-0203.
You can read the full text:

Read

Contributors

Be the first to contribute to this page

Knowledge graphs generation from cultural heritage texts: combining LLMs and ontological engineering for scholarly debates

What is it about?

Contributors

Discover more

Medical Research

Life Sciences

Physical Sciences

Technology and Engineering

Environmental Research

Arts and Humanities

Social Sciences

Business and Management

Knowledge graphs generation from cultural heritage texts: combining LLMs and ontological engineering for scholarly debates

What is it about?

Featured Image

Read the Original

Contributors

Share this page:

Discover more

Medical Research

Life Sciences

Physical Sciences

Technology and Engineering

Environmental Research

Arts and Humanities

Social Sciences

Business and Management