Creating and Analyzing Literary Corpora

Michael Percillier
  • January 2017, Springer Science + Business Media
  • DOI: 10.1007/978-3-319-54499-1_4

How to make a corpus of literary texts from scratch

What is it about?

The chapter describes the creation of a digital corpus from printed source texts, as well as its subsequent annotation and analysis. Using the methodology from the research project "Representations of oral varieties of language in the literature of the English-speaking world" as a working example, the chapter addresses three topics of interest to the field of Digital Humanities: 1. The creation of a digital corpus from printed source texts, 2. Corpus annotation, 3. Qualitative and quantitative data analysis.

Why is it important?

Applying a corpus-based approach to literary texts enables the discovery of patterns that may have otherwise remained hidden using purely impressionistic methods.


Dr Michael Percillier (Author)
Universitat Mannheim

The approach described in this chapter is situated between the two "extremes" of impressionistic close-reading, common in literary studies, and so-called distant-reading, common in Digital Humanities, where texts may actually not be read. The manual annotation of texts for specific features constitutes a type of close-reading which can subsequently be analysed quantitatively, thereby combining the benefits of the Digital Humanities approach with the focus and expertise characteristic of the traditional approach of literary scholarship.

Read Publication

The following have contributed to this page: Dr Michael Percillier