Advertisement
Journal Home
Search for

Volume 46, Issue 1, Pages 141-142 (January 2009)


View previous. 17 of 19 View next.

Interrater reliability and the kappa statistic: A comment on Morris et al. (2008)

Jan KottnerCorresponding Author Informationemail address

Received 26 March 2008; received in revised form 1 April 2008; accepted 1 April 2008.

Article Outline

Conflict of interest

References

Copyright

Establishing interrater reliability of instruments is an important issue in nursing research and practice. Morris et al.'s (2008) paper highlights the problem of choosing the appropriate statistical approach for interrater reliability data analysis and the authors raise the important and relevant question how to interpret kappa-like statistics like Cohen's (κ) or weighted kappa (κw).

It is true that the often called ‘chance corrected’ κ was frequently criticised because its value is dependent on the prevalence of the rated trait in the sample (‘base rate problem’). Consequently, even if two raters nearly or exactly agree, κ-coefficients are nearly or equal to 0 if the prevalence of the rated characteristic is very high or very low. This objects the natural expectation that interrater reliability must be high as well. However, this is neither a limitation nor a “main drawback” (p. 646). In fact it is a desired property, because κ-coefficients are classical interrater reliability coefficients (Dunn, 2004, Kraemer et al., 2002, Landis and Koch, 1975). In the classical test theory, reliability is defined as the ratio of variability between subjects (or targets) to the total variability. The total variability is the sum of subject (target) variability and the measurement error (Dunn, 2004, Streiner and Norman, 2003). Consequently, if the variance between the subjects is very small or even zero the reliability coefficient would be near zero as well. Therefore, reliability coefficients do not only reflect the degree of agreement between raters, but also the degree to which a measurement instrument can differentiate among individuals. The only reason to apply a particular instrument to a measurement situation is to differentiate between individuals (Streiner and Norman, 2003). If this very instrument is not able to detect any differences, one should rather question the instrument than the statistic (Shrout, 1998). Morris et al. (2008) were unable to calculate κw in 18 out of 63 variables, because of the “large number of ‘constants’ in the data” (p. 646). In other words, both participating experienced staff nurses were unable to detect any variability among 30 clients when nearly one-third of the I-NMDS variables were applied. This is an important finding. Users must be careful when applying these 18 I-NMDS items, because the degree of interrater reliability is still unknown. That means a certain degree of relative precision of these items could not be demonstrated. It is questionable to apply such a measurement, when it was already expected that variability of the I-NMDS variables would be very low (p. 647).

Every interrater reliability coefficient is unavoidably linked to the population to which the instrument is applied (Kraemer, 1979, Streiner and Norman, 2003). Low interrater reliability coefficients are caused either by a lack of agreement between raters or by non-identified differences between the rated subjects. Nevertheless the strength of interrater reliability coefficients is that it is an indicator for the quality and the clinical value of observations characterising individuals or subjects (Kraemer, 1979, Shrout, 1998).

For further discussions on this topic it would be helpful to strictly differentiate between interrater reliability and interrater agreement. High proportions of interrater agreement are important but kappa-like statistics provide information about the clinical value of the ratings. Finally, calculated proportions of overall agreement are also dependent on sample characteristics. If the prevalence of a category is very high or very low it is likely that the overall proportion of agreement is inflated. To overcome this ‘limitation’ certain indices of specific agreement seem to be very helpful (Fleiss et al., 2003, Szklo and Nieto, 2007).

Conflict of interest 

return to Article Outline

None.

References 

return to Article Outline

Dunn, 2004. 1.Dunn G. Statistical Evaluation of Measurement Errors: Design and Analysis of Reliability Studies. second ed.. London: Hodder Arnold; 2004;.

Fleiss et al., 2003. 2.Fleiss JL, Levin B, Paik MC. Statistical Methods for Rates and Proportions. third ed.. New Jersey: Wiley; 2003;.

Kraemer, 1979. 3.Kraemer HC. Ramifications of a population model for κ as a coefficient of reliability. Psychometrika. 1979;44(4):461–472. CrossRef

Kraemer et al., 2002. 4.Kraemer HC, Periyakoil VS, Noda A. Kappa coefficients in medical research. Stat. Med. 2002;21(14):2109–2129. MEDLINE | CrossRef

Landis and Koch, 1975. 5.Landis JR, Koch GG. A review of statistical methods in the analysis of data arising from observer reliability studies (part I). Statistica Neerlandica. 1975;29:101–123.

Morris et al., 2008. 6.Morris R, MacNeela P, Scott A, Treacy P, Hyde A, O’Brien J, et al. Ambiguities and conflicting results: The limitations of the kappa statistic in establishing the interrater reliability of the Irish nursing minimum data set for mental health: a discussion paper. Int. J. Nurs. Stud. 2008;45(4):645–647. Abstract | Full Text | Full-Text PDF (104 KB) | CrossRef

Shrout, 1998. 7.Shrout PE. Measurement reliability and agreement in psychiatry. Stat. Methods Med. Res. 1998;7(3):301–317. MEDLINE | CrossRef

Streiner and Norman, 2003. 8.Streiner DL, Norman GR. Health Measurement Scales: A Practical Guide to their Development and Use. third ed.. Oxford: Oxford University Press; 2003;.

Szklo and Nieto, 2007. 9.Szklo M, Nieto FJ. Epidemiology Beyond the Basics. second ed.. Sudbury: Jones and Barlett Publishers; 2007;.

Centre for Humanities and Health Sciences, Department of Nursing Science, Charité Universitätsmedizin Berlin, Charitéplatz 1, 10117 Berlin, Germany

Corresponding Author InformationTel.: +49 30 450 529 054; fax: +49 30 450 529 900.

PII: S0020-7489(08)00088-6

doi:10.1016/j.ijnurstu.2008.04.001


View previous. 17 of 19 View next.