Compiling computer-mediated spoken language corpora

Stefan Diemer; Marie-Louise Brunner; Selina Schmidt

doi:10.1075/ijcl.21.3.03die

What is it about?

This paper looks at key issues in the compilation of spoken language corpora in a computer-mediated communication (CMC) environment, using data from CASE (forthcoming), the Corpus of Academic Spoken English in an international context, which is currently being compiled at Saarland University, Germany, in cooperation with partners from different countries. Based on preliminary findings, new recommendations concerning data collection, treatment, compilation, and transcription are put forward to supplement existing best practice as presented in Wynne (2005). Our main general recommendations are the use of Skype as a suitable tool for collecting spoken data for linguistic analysis which moves the recording of spoken data out of a restricted laboratory setting. During anonymisation, special care has to be taken with the video component, while preserving multimodal features for analysis. We recommend the addition of a number of annotation elements already at the transcription stage, particularly the CMC-related discourse features of overlap, echo, interference and pauses, the English as a Lingua Franca (ELF) features of non-standard language and code-switching, as well as the inclusion of prosodic, paralinguistic, and non-verbal annotation. Additionally, we propose a layered corpus design in order to allow researchers to focus on specific annotation features.

This page is a summary of: Compiling computer-mediated spoken language corpora, International Journal of Corpus Linguistics, September 2016, John Benjamins,
DOI: 10.1075/ijcl.21.3.03die.
You can read the full text:

Read

Contributors

The following have contributed to this page

Compiling computer-mediated spoken language corpora

What is it about?

Contributors

Discover more

Medical Research

Life Sciences

Physical Sciences

Technology and Engineering

Environmental Research

Arts and Humanities

Social Sciences

Business and Management

Compiling computer-mediated spoken language corpora

What is it about?

Featured Image

Read the Original

Contributors

Share this page:

Discover more

Medical Research

Life Sciences

Physical Sciences

Technology and Engineering

Environmental Research

Arts and Humanities

Social Sciences

Business and Management