What is it about?

Sparse-rated data are common in operational performance-based language tests, as an inevitable result of assigning examinee responses to a fraction of available raters. The current study investigates the precision of two generalizability-theory methods (the rating method and the subdividing method) designed to estimate score reliability from sparse-rated data. Results suggest that when a mixture of novice and experienced raters are deployed in a rating session, the sub-dividing method is recommended as it yields more precise reliability estimates. When all raters are expected to be interchangeable, both the rating and sub-dividing methods are equally precise, and the rating method is recommended for operational use, as it is easier to implement in practice.

Featured Image

Why is it important?

Examining the estimation precision of reliability is of great importance because the utility of any performance-based language test depends on its reliability. In addition to investigating the precision of different G-theory estimation methods, the current study demonstrates a step-by-step analysis for investigating the score reliability from sparse-rated data taken from a large-scale English speaking proficiency test. Implications for operational performance-based language tests are discussed.

Read the Original

This page is a summary of: Working with sparse data in rated language tests: Generalizability theory applications, Language Testing, August 2016, SAGE Publications,
DOI: 10.1177/0265532216638890.
You can read the full text:

Read

Contributors

The following have contributed to this page