Introduction: Exploring the IntUne corpus

Paul Bayley; Geoffrey Williams

doi:10.1093/acprof:oso/9780199602308.003.0001

What is it about?

The introduction gives background to the activities of the media group in the FP6 IntUne project. It above all describes the corpus, or rather corpora as comparable corpora were built for the four working languages: English, French, Italian and Spanish.

Why is it important?

This is a model for comparative corpora. It describes how data was selected following precise reproducible criteria and how the corpus was prepared using POS tagging and TEI-XML markup. It above all demonstrates how crafted corpora can give useful indexers to real questions rather than just having a garbage bag of everything found on the web.

Perspectives

This was a cooperative exercise in corpus building. It produced a dataset of great quality and set a path that others should follow
Professor Geoffrey Clive Williams
Universite de Bretagne-Sud

This page is a summary of: Introduction: Exploring the IntUne corpus, May 2012, Oxford University Press (OUP),
DOI: 10.1093/acprof:oso/9780199602308.003.0001.
You can read the full text:

Read

Contributors

The following have contributed to this page

Professor Geoffrey Clive Williams
Universite de Bretagne-Sud

A State of the art corpus

What is it about?

Why is it important?

Perspectives

Contributors

Discover more

Medical Research

Life Sciences

Physical Sciences

Technology and Engineering

Environmental Research

Arts and Humanities

Social Sciences

Business and Management

A State of the art corpus

What is it about?

Featured Image

Why is it important?

Perspectives

Read the Original

Contributors

Share this page:

Discover more

Medical Research

Life Sciences

Physical Sciences

Technology and Engineering

Environmental Research

Arts and Humanities

Social Sciences

Business and Management