Improving the efficiency of stroke trials: feasibility and efficacy of group adjudication of functional end points

Kate S McArthur,Paul C D Johnson,Terence J Quinn,Peter Higgins,Peter Langhorne,Matthew R Walters,Christopher J Weir,Jesse Dawson,Kennedy R Lees,CARS investigators
DOI: https://doi.org/10.1161/STROKEAHA.113.002266
IF: 10.17
Stroke
Abstract:Background and purpose: Use of the modified Rankin scale (mRS) in multicenter trials may be limited by interobserver variability. We assessed the effect of this on trial power and developed a novel group adjudication approach. Methods: We generated power and sample size estimates from simulated trials modeled with varying mRS reliability. We conducted a virtual acute stroke trial across 14 UK sites to develop a group adjudication approach. Traditional mRS interviews, performed at local sites, were digitally recorded and scored by adjudication committee. We assessed the effect of translation by comparing scores in translated mRS interviews, originally conducted in English and Mandarin. Agreement was measured using κ and weighted κ (κw) statistics and intraclass correlation coefficient. Results: Statistical simulations suggest that improving mRS reliability from κ=0.25 to κ=0.5 or 0.7 may allow reductions in sample size of n=386 or 490 in a typical n=2000 study. Our virtual acute stroke trial included 370 participants and 563 mRS video assessments. We adjudicated mRS in 538 of 563 (96%) study visits. At 30 and 90 days, 161 of 280 (57.5%) and 131 of 258 (50.8%) clips showed interobserver disagreement. Agreement within the adjudication committee was good (30-day κw=0.85 [95% confidence interval, 0.81-0.86]; 90-day κw=0.86 [95% confidence interval, 0.82-0.88]) without significant or systematic bias in mRS scoring compared with the local mRS. Interobserver reliability of translated mRS assessments was similar to native language clips (native [n=69] κw=0.91 [95% confidence interval, 0.94-0.99]; translated [n=89] κw=0.90 [95% confidence interval, 0.83-0.96]). Conclusions: Achievable improvements in interobserver reliability may substantially reduce study sample size, with associated financial benefits. Central adjudication of mRS assessments is feasible (including across international centers), valid and reliable despite the challenges of mRS assessment in large clinical trials.
What problem does this paper attempt to address?