An R package for automatically generating candidate correspondence tables between classifications

Martin Karlberg; Vasilis Chasiotis; Photis Stavropoulos; Christine Laaboudi; Mátyás Mészáros; Despoina-Avgerini Nasiopoulou

doi:10.3233/sji-230039

What is it about?

This paper explains why correspondence tables are important for classifications, and describes the newly developed "correspondenceTables" R package. It provides practical examples to show its strengths and weaknesses when it comes to alleviating the task load of statistical classification experts so that they can focus on tasks where their expertise is needed.

Photo by Omar Flores on Unsplash

Why is it important?

It is already possible to automatically create correspondence tables between two classifications by means of a big "outer join" of intermediate correspondence tables. However, as is so often the case, the main challenge is input data quality: intermediate correspondence tables are typically set up for other purposes than automatic correspondence table creation, and just feeding them into an "outer join" may lead to candidate correspondence tables that may be inappropriate (by being incomplete or by containing misleading records). The main added value of the correspondenceTables R package that we present in this paper is thus the extensive quality control that is being carried out - including the flagging of problematic records. By applying the package (sometimes repeatedly to fix all the quality issues), statistical classification experts will be provided with a candidate correspondence table where only the tricky (e.g. many-to-many) cases are highlighted, allowing them to focus on the most challenging records instead of having to carry out tasks of a more clerical nature.

Perspectives

The practical examples show both situations where the package excels (parallel merges in two classifications) and situations where the package generates a lot of "Cartesian noise" (parallel splits in two classifications). Like many great discoveries (not saying that this is one of them...) this is quite obvious with some hindsight. A practical approach (isolating out parallel splits) is presented for tackling this.
Dr Martin Karlberg
Eurostat

This page is a summary of: An R package for automatically generating candidate correspondence tables between classifications, Statistical Journal of the IAOS, December 2023, IOS Press,
DOI: 10.3233/sji-230039.
You can read the full text:

Read

Resources

URL
The correspondenceTables R package on CRAN
The stable version of the correspondenceTables R package (on CRAN).

Contributors

The following have contributed to this page

Dr Martin Karlberg
Eurostat

The "correspondenceTables" R package helps classification experts by taking care of grunt work

What is it about?

Why is it important?

Perspectives

Resources

The correspondenceTables R package on CRAN

Contributors

You might also like

Discover more

Medical Research

Life Sciences

Physical Sciences

Technology and Engineering

Environmental Research

Arts and Humanities

Social Sciences

Business and Management

The "correspondenceTables" R package helps classification experts by taking care of grunt work

What is it about?

Featured Image

Why is it important?

Perspectives

Read the Original

Resources

The correspondenceTables R package on CRAN

Contributors

Share this page:

You might also like

Validation of methods and data for SDG indicators1

A personal history of Bayesian statistics

Using the index method for international comparison of indicators of GDP factors

Discover more

Medical Research

Life Sciences

Physical Sciences

Technology and Engineering

Environmental Research

Arts and Humanities

Social Sciences

Business and Management