Estimators of various kappa coefficients based on the unbiased estimator of the expected index of agreements

A. Martín Andrés,M. Álvarez Hernández

DOI: https://doi.org/10.1007/s11634-024-00581-x

2024-03-08

Advances in Data Analysis and Classification

Abstract:To measure the degree of agreement between R observers who independently classify n subjects within K categories, various kappa -type coefficients are often used. When R = 2, it is common to use the Cohen' kappa , Scott's pi , Gwet's AC1/2 , and Krippendorf's alpha coefficients (weighted or not). When R > 2, some pairwise version based on the aforementioned coefficients is normally used; with the same order as above: Hubert's kappa , Fleiss's kappa , Gwet's AC1/2, and Krippendorf's alpha . However, all these statistics are based on biased estimators of the expected index of agreements, since they estimate the product of two population proportions through the product of their sample estimators. The aims of this article are three. First, to provide statistics based on unbiased estimators of the expected index of agreements and determine their variance based on the variance of the original statistic. Second, to make pairwise extensions of some measures. And third, to show that the old and new estimators of the Cohen's kappa and Hubert's kappa coefficients match the well-known estimators of concordance and intraclass correlation coefficients, if the former are defined by assuming quadratic weights. The article shows that the new estimators are always greater than or equal the classic ones, except for the case of Gwet where it is the other way around, although these differences are only relevant with small sample sizes (e.g. n ≤ 30).

statistics & probability

What problem does this paper attempt to address?

The paper attempts to address the issue of consistency measurement when multiple observers classify the same set of objects. Specifically, the authors focus on how to improve the estimation methods of various existing Kappa coefficients (such as Cohen's kappa, Scott's pi, Gwet's AC1/2, and Krippendorf's alpha). The main objectives of the paper include: 1. Providing various Kappa coefficients based on unbiased estimates of the expected consistency index and determining the variance of these new estimators. 2. Extending some metrics pairwise, but differently from traditional methods. 3. Demonstrating the relationship between the new estimators of Cohen's kappa and Hubert's kappa coefficients with known covariances and intraclass correlation coefficients (ICC). Through these improvements, the authors hope to enhance the accuracy of consistency evaluation, especially in cases with small sample sizes.

Estimators of various kappa coefficients based on the unbiased estimator of the expected index of agreements

New variances for various kappa coefficients based on the unbiased estimator of the expected index of agreements

High Agreement and High Prevalence: The Paradox of Cohen’s Kappa

Kappa statistic considerations in evaluating inter-rater reliability between two raters: which, when and context matters

Measures of Agreement with Multiple Raters: Fréchet Variances and Inference

Toward improved inference for Krippendorff's Alpha agreement coefficient

Statistical inference for agreement between multiple raters on a binary scale

Measuring agreement among several raters classifying subjects into one-or-more (hierarchical) nominal categories. A generalisation of Fleiss' kappa

Interrater agreement statistics under the two-rater dichotomous-response case with correlated decisions

Assessing agreement on classification tasks: the kappa statistic

Liberal-Conservative Hierarchies of Intercoder Reliability Estimators

Statistical models for assessing agreement for quantitative data with heterogeneous random raters and replicate measurements

Explaining the unsuitability of the kappa coefficient in the assessment and comparison of the accuracy of thematic maps obtained by image classification

The Kappa Paradox Explained

Resampling-based inference methods for comparing two coefficient alpha

Relationships of Cohen's Kappa, Sensitivity, and Specificity for Unbiased Annotations

Do they agree? Bibliometric evaluation vs informed peer review in the Italian research assessment exercise

Miettinen and Nurminen score statistics revisited

Why Cohen’s Kappa should be avoided as performance measure in classification

Asymptotic Confidence Interval, Sample Size Formulas and Comparison Test for the Agreement Intra-Class Correlation Coefficient in Inter-Rater Reliability Studies