What is it about?
This article is on an image-based search engine for historical handwritten collections, Monk. The article describes the overall system architecture and addresses in more detail the issues of data storage. The question was whether a rigorous Posix-file approach for object storage, yielding a massive number of 'normal' files ranging from very small to large images was feasible, if this entailed more than a billion objects. Using a parallel file system, gpfs, this appeared to be possible, with many benefits at the level of back-office programming where diverse software packages are being used for e-Science.
Featured Image
Why is it important?
Currently, the revolution in big data has led to many new data storage solutions in software and hardware. However, not all of them have the professional stability of traditional file systems and databases. Also, the field is in a fast flux, making it difficult to make good choices that are a good investment in the future.
Perspectives
The article allowed me to give an overview of the Monk system and address technical issues that are usually not addressed in the machine-learning literature. As a research group we are very content with the scaleable Posix solution presented in the paper. However, there are cost factors involved, which require us to continuous monitor other new developments in storage software in order to keep this very large and growing system operational on the longer term.
prof. dr. Lambert Schomaker
Rijksuniversiteit Groningen
Read the Original
This page is a summary of: Design considerations for a large-scale image-based text search engine in historical manuscript collections, it - Information Technology, January 2016, De Gruyter,
DOI: 10.1515/itit-2015-0049.
You can read the full text:
Contributors
The following have contributed to this page







