Using digital corpora for preserving and processing cultural heritage texts: a case study

  • Eleni Galiotou
  • Library Review, August 2014, Emerald
  • DOI: 10.1108/lr-11-2013-0142

Creation and exploitation of digitized historical corpora

What is it about?

Access to collections of manuscripts and early printed books which are kept in monasteries and other remote historical sites is quite a difficult task due the deteriorated state of the collection items, the conditions under which the collections are kept and the limited accessibility to the location. In this paper we describe the creation and exploitation of a digitized historical corpus in an attempt to contribute to the preservation and availability of cultural heritage documents.

Why is it important?

The results of this undertaking can give useful insights as for the creation of corpora of cultural heritage documents and as for the methods for the processing and exploitation of the digitized documents which take into account the language in which the documents are written


Professor Eleni Galiotou
Technological Educational Institute of Athens

in the course of the project, novel methods of accessing the digitized corpus were developed. In addition, the Natural Language Processing tools presented in this paper have given insight as to the characteristics of an early stage of Modern Greek (17th and 18th centuries) which incorporates elements from Ancient, Medieval and Modern Greek. Therefore, they constitute a first step towards the development of computational tools for the study of the diachronic evolution of the language.

Read Publication

The following have contributed to this page: Professor Eleni Galiotou