What is it about?

The book "Natural Language Processing for Corpus Linguistics" by Dunn (2022) is a practical guide published in the Cambridge Elements in Corpus Linguistics series. With a strong focus on hands-on learning, it features 20 interactive Python labs covering text classification, similarity models, and validation/visualization techniques. The book introduces crucial NLP concepts, including vector space representations and word embeddings, while emphasizing ethical considerations in language analysis. While it simplifies certain underlying assumptions and aspects of language processing, the book serves as a valuable resource for both experts and students in corpus linguistics, providing practical insights into computational complexity and addressing real-world challenges in natural language processing.

Featured Image

Why is it important?

A review of books aimed at the intersection of NLP and linguistics is important to allow people interested in understanding what is happening in the textual AI/ML space to understand how to best approach understanding these fields, contributing to them in an ethical manner, and understand their interdisciplinary connection. By assessing the book's practicality, accessibility, underlying assumptions, and ethical considerations, the review guides linguists and researchers in navigating the complexities of natural language processing. The review serves as a valuable resource for individuals seeking to understand the societal implications of incorporating computational linguistics into their research and analytical practices.


I was very glad to see the strong practical focus and accessible in Dunn's book, two features that are often lacking in more theoretical approaches to NLP or Corpus Linguistics. Particularly chapters detailing how calculations are carried out under the hood of large language models at a gentle pace help demystify NLP approaches. Another key aspect is the reusability and adaptability of the code labs resources that come with this book. They provide useful practical starting points for researchers from backgrounds as diverse as Corpus-Based Sociolinguistics, Corpus Stylistics, Multilingualism, and Discourse Analysis and anyone else with an interest in exploring language using computational methods.

Hanna Schmueck
Lancaster University

Read the Original

This page is a summary of: Review of Dunn (2022): Natural Language Processing for Corpus Linguistics, International Journal of Corpus Linguistics, December 2023, John Benjamins,
DOI: 10.1075/ijcl.00057.sch.
You can read the full text:



The following have contributed to this page