Equivalency of the diagnostic accuracy of the PHQ-8 and PHQ-9: a systematic review and individual participant data meta-analysis

Yin Wu,Brooke Levis,Kira E Riehm,Nazanin Saadat,Alexander W Levis,Marleine Azar,Danielle B Rice,Jill Boruff,Pim Cuijpers,Simon Gilbody,John P A Ioannidis,Lorie A Kloda,Dean McMillan,Scott B Patten,Ian Shrier,Roy C Ziegelstein,Dickens H Akena,Bruce Arroll,Liat Ayalon,Hamid R Baradaran,Murray Baron,Charles H Bombardier,Peter Butterworth,Gregory Carter,Marcos H Chagas,Juliana C N Chan,Rushina Cholera,Yeates Conwell,Janneke M de Man-van Ginkel,Jesse R Fann,Felix H Fischer,Daniel Fung,Bizu Gelaye,Felicity Goodyear-Smith,Catherine G Greeno,Brian J Hall,Patricia A Harrison,Martin Härter,Ulrich Hegerl,Leanne Hides,Stevan E Hobfoll,Marie Hudson,Thomas Hyphantis,Masatoshi Inagaki,Nathalie Jetté,Mohammad E Khamseh,Kim M Kiely,Yunxin Kwan,Femke Lamers,Shen-Ing Liu,Manote Lotrakul,Sonia R Loureiro,Bernd Löwe,Anthony McGuire,Sherina Mohd-Sidik,Tiago N Munhoz,Kumiko Muramatsu,Flávia L Osório,Vikram Patel,Brian W Pence,Philippe Persoons,Angelo Picardi,Katrin Reuter,Alasdair G Rooney,Iná S Santos,Juwita Shaaban,Abbey Sidebottom,Adam Simning,Lesley Stafford,Sharon Sung,Pei Lin Lynnette Tan,Alyna Turner,Henk C van Weert,Jennifer White,Mary A Whooley,Kirsty Winkley,Mitsuhiko Yamada,Andrea Benedetti,Brett D Thombs
DOI: https://doi.org/10.1017/S0033291719001314
Abstract:Background: Item 9 of the Patient Health Questionnaire-9 (PHQ-9) queries about thoughts of death and self-harm, but not suicidality. Although it is sometimes used to assess suicide risk, most positive responses are not associated with suicidality. The PHQ-8, which omits Item 9, is thus increasingly used in research. We assessed equivalency of total score correlations and the diagnostic accuracy to detect major depression of the PHQ-8 and PHQ-9. Methods: We conducted an individual patient data meta-analysis. We fit bivariate random-effects models to assess diagnostic accuracy. Results: 16 742 participants (2097 major depression cases) from 54 studies were included. The correlation between PHQ-8 and PHQ-9 scores was 0.996 (95% confidence interval 0.996 to 0.996). The standard cutoff score of 10 for the PHQ-9 maximized sensitivity + specificity for the PHQ-8 among studies that used a semi-structured diagnostic interview reference standard (N = 27). At cutoff 10, the PHQ-8 was less sensitive by 0.02 (-0.06 to 0.00) and more specific by 0.01 (0.00 to 0.01) among those studies (N = 27), with similar results for studies that used other types of interviews (N = 27). For all 54 primary studies combined, across all cutoffs, the PHQ-8 was less sensitive than the PHQ-9 by 0.00 to 0.05 (0.03 at cutoff 10), and specificity was within 0.01 for all cutoffs (0.00 to 0.01). Conclusions: PHQ-8 and PHQ-9 total scores were similar. Sensitivity may be minimally reduced with the PHQ-8, but specificity is similar.
What problem does this paper attempt to address?