Levels of Evidence at the AAOS Meeting: Can Authors Rate Their Own Submissions, and Do Other Raters Agree?

Andrew H Schmidt,Guofen Zhao,Charles Turkelson
DOI: https://doi.org/10.2106/jbjs.g.01233
2009-04-01
Abstract:BACKGROUND: A hierarchy of levels of evidence is commonly used to categorize the methodology of scientific studies in order to assist in their critical analysis. Organizers of large scientific meetings are faced with the problem of whether and how to assign levels of evidence to studies that are presented. The present study was performed to investigate two hypotheses: (1) that session moderators and others can consistently assign a level of evidence to papers presented at national meetings, and (2) that there is no difference between the level of evidence provided by the author of a paper and the level of evidence assigned by independent third parties (e.g., members of the Program Committee).METHODS: A subset of papers accepted for presentation at the 2007 American Academy of Orthopaedic Surgeons (AAOS) Annual Meeting was used to evaluate differences in the levels of evidence assigned by the authors, volunteer graders who had access to only the abstract, and session moderators who had access to the full paper. The approved AAOS levels of evidence were used. Statistical tests of interrater correlation were done to compare the various raters to each other, with significance appropriately adjusted for multiple comparisons.RESULTS: Interrater agreement was better than chance for most comparisons between different graders; however, the level of agreement ranged from slight to moderate (kappa=0.16 to 0.46), a finding confirmed by agreement coefficient statistics. In general, raters had difficulty in agreeing whether a study comprised Level-I or Level-II evidence and authors graded the level of evidence of their own work more favorably than did others who graded the abstract.CONCLUSIONS: When abstracts submitted to the AAOS Annual Meeting were rated, there was substantial inconsistency in the assignments of the level of evidence to a given study by different observers and there was some evidence that authors may not rate their own work the same as independent reviewers. This has important implications for the use of levels of evidence in scientific meetings.
What problem does this paper attempt to address?