What is it about?

Estimation of the degree of agreement between different raters is of crucial importance in medical and social sciences. There are lots of different approaches proposed in the literature for this aim. In this article, we focus on the inter-rater agreement measures for the ordinal variables. The ordinal nature of the variable makes this estimation task more complicated. Although there are modified versions of inter-rater agreement measures for ordinal tables, there is no clear agreement on the use of a particular approach. We conduct an extensive Monte Carlo simulation study to evaluate and compare the accuracy of mainstream inter-rater agreement measures for ordinal tables with each other and figure out the effect of different table structures on the accuracy of these measures. Our results are useful in the sense that they provide detailed information about which measure to use with different table structures to get most reliable inferences about the degree of agreement between two raters. With our simulation study, we recommend use of Gwet’s AC2 and Brennan-Prediger’s κ in the situation where there is high agreement among raters. However, it should be noted that these coefficients overstate the extent of agreement among raters when there is no agreement, and the data is unbalanced.

Featured Image

Why is it important?

By using the results of this simulation, we identify which inter-rater measure and weighting scheme combination has less bias, and how their bias are affected from the degree of true inter-rater agreement, the structure of the R × R table, the number of ordinal ratings, and the total sample size.

Perspectives

Overall, the accuracy of the measures is sensitive to the used weights if the table of interest is obviously unbalanced and the true agreement is not that low. In terms of agreement measures, all measures perform similar for the balanced R × R table structures, where else, they differ more with increasing unbalancedness. Specifically, we recommend use of Gwet’s AC2 and Brennan-Prediger’s κ in the unbalanced medium and high agreement levels. However, it should be noted that these coefficients overstate the extent of agreement among raters when there is no agreement, and the data is unbalanced.

Quoc Duyet Tran
RMIT University

Read the Original

This page is a summary of: Weighted inter-rater agreement measures for ordinal outcomes, Communications in Statistics - Simulation and Computation, October 2018, Taylor & Francis,
DOI: 10.1080/03610918.2018.1490428.
You can read the full text:

Read

Contributors

The following have contributed to this page