What is it about?

The introduction gives background to the activities of the media group in the FP6 IntUne project. It above all describes the corpus, or rather corpora as comparable corpora were built for the four working languages: English, French, Italian and Spanish.

Featured Image

Why is it important?

This is a model for comparative corpora. It describes how data was selected following precise reproducible criteria and how the corpus was prepared using POS tagging and TEI-XML markup. It above all demonstrates how crafted corpora can give useful indexers to real questions rather than just having a garbage bag of everything found on the web.

Perspectives

This was a cooperative exercise in corpus building. It produced a dataset of great quality and set a path that others should follow

Professor Geoffrey Clive Williams
Universite de Bretagne-Sud

Read the Original

This page is a summary of: Introduction: Exploring the IntUne corpus, May 2012, Oxford University Press (OUP),
DOI: 10.1093/acprof:oso/9780199602308.003.0001.
You can read the full text:

Read

Contributors

The following have contributed to this page