THE DEPRESSION INVENTORY DEVELOPMENT SCALE: Assessment of Psychometric Properties Using Classical and Modern Measurement Theory in a CAN-BIND Trial
Anthony L Vaccarino,Amir H Kalali,Pierre Blier,Susan Gilbert Evans,Nina Engelhardt,Jane A Foster,Benicio N Frey,John H Greist,Kenneth A Kobak,Raymond W Lam,Glenda MacQueen,Roumen Milev,Daniel J Müller,Sagar V Parikh,Franca M Placenza,Sakina J Rizvi,Susan Rotzinger,David V Sheehan,Terrence Sills,Claudio N Soares,Gustavo Turecki,Rudolph Uher,Janet B W Williams,Sidney H Kennedy,Kenneth R Evans
2020-07-01
Abstract:Objective: The goal of the Depression Inventory Development (DID) project is to develop a comprehensive and psychometrically sound rating scale for major depressive disorder (MDD) that reflects current diagnostic criteria and conceptualizations of depression. We report here the evaluation of the current DID item bank using Classical Test Theory (CTT), Item Response Theory (IRT) and Rasch Measurement Theory (RMT). Methods: The present study was part of a larger multisite, open-label study conducted by the Canadian Biomarker Integration Network in Depression (ClinicalTrials.gov: NCT01655706). Trained raters administered the 32 DID items at each of two visits (MDD: baseline, n=211 and Week 8, n=177; healthy participants: baseline, n=112 and Week 8, n=104). The DID's "grid" structure operationalizes intensity and frequency of each item, with clear symptom definitions and a structured interview guide, with the current iteration assessing symptoms related to anhedonia, cognition, fatigue, general malaise, motivation, anxiety, negative thinking, pain, and appetite. Participants were also administered the Montgomery- Åsberg Depression Rating Scale (MADRS) and Quick Inventory of Depressive Symptomatology-Self-Report (QIDS-SR) that allowed DID items to be evaluated against existing "benchmark" items. CTT was used to assess data quality/reliability (i.e., missing data, skewness, scoring frequency, internal consistency), IRT to assess individual item performance by modelling an item's ability to discriminate levels of depressive severity (as assessed by the MADRS), and RMT to assess how the items perform together as a scale to capture a range of depressive severity (item targeting). These analyses together provided empirical evidence to base decisions on which DID items to remove, modify, or advance. Results: Of the 32 DID items evaluated, eight items were identified by CTT as problematic, displaying low variability in the range of responses, floor effects, and/or skewness; and four items were identified by IRT to show poor discriminative properties that would limit their clinical utility. Five additional items were deemed to be redundant. The remaining 15 DID items all fit the Rasch model, with person and item difficulty estimates indicating satisfactory item targeting, with lower precision in participants with mild levels of depression. These 15 DID items also showed good internal consistency (alpha=0.95 and inter-item correlations ranging from r=0.49 to r=0.84) and all items were sensitive to change following antidepressant treatment (baseline vs. Week 8). RMT revealed problematic item targeting for the MADRS and QIDSSR, including an absence of MADRS items targeting participants with mild/moderate depression and an absence of QIDS-SR items targeting participants with mild or severe depression. Conclusion: The present study applied CTT, IRT, and RMT to assess the measurement properties of the DID items and identify those that should be advanced, modified, or removed. Of the 32 items evaluated, 15 items showed good measurement properties. These items (along with previously evaluated items) will provide the basis for validation of a penultimate DID scale assessing anhedonia, cognitive slowing, concentration, executive function, recent memory, drive, emotional fatigue, guilt, self-esteem, hopelessness, tension, rumination, irritability, reduced appetite, insomnia, sadness, worry, suicidality, and depressed mood. The strategies adopted by the DID process provide a framework for rating scale development and validation.