Bridging Big Data: Procedures for Combining Non-equivalent Cognitive Measures from the ENIGMA Consortium

Eamonn Kennedy,Shashank Vadlamani,Hannah M Lindsey,Pui-Wa Lei,Mary Jo-Pugh,Maheen Adamson,Martin Alda,Silvia Alonso-Lana,Sonia Ambrogi,Tim J Anderson,Celso Arango,Robert F Asarnow,Mihai Avram,Rosa Ayesa-Arriola,Talin Babikian,Nerisa Banaj,Laura J Bird,Stefan Borgwardt,Amy Brodtmann,Katharina Brosch,Karen Caeyenberghs,Vince D Calhoun,Nancy D Chiaravalloti,David X Cifu,Benedicto Crespo-Facorro,John C Dalrymple-Alford,Kristen Dams-O'Connor,Udo Dannlowski,David Darby,Nicholas Davenport,John DeLuca,Covadonga M Diaz-Caneja,Seth G Disner,Ekaterina Dobryakova,Stefan Ehrlich,Carrie Esopenko,Fabio Ferrarelli,Lea E Frank,Carol Franz,Paola Fuentes-Claramonte,Helen Genova,Christopher C Giza,Janik Goltermann,Dominik Grotegerd,Marius Gruber,Alfonso Gutierrez-Zotes,Minji Ha,Jan Haavik,Charles Hinkin,Kristen R Hoskinson,Daniela Hubl,Andrei Irimia,Andreas Jansen,Michael Kaess,Xiaojian Kang,Kimbra Kenney,Barbora Keřková,Mohamed Salah Khlif,Minah Kim,Jochen Kindler,Tilo Kircher,Karolina Knížková,Knut K Kolskår,Denise Krch,William S Kremen,Taylor Kuhn,Veena Kumari,Jun Soo Kwon,Roberto Langella,Sarah Laskowitz,Jungha Lee,Jean Lengenfelder,Spencer W Liebel,Victoria Liou-Johnson,Sara M Lippa,Marianne Løvstad,Astri Lundervold,Cassandra Marotta,Craig A Marquardt,Paulo Mattos,Ahmad Mayeli,Carrie R McDonald,Susanne Meinert,Tracy R Melzer,Jessica Merchán-Naranjo,Chantal Michel,Rajendra A Morey,Benson Mwangi,Daniel J Myall,Igor Nenadić,Mary R Newsome,Abraham Nunes,Terence O'Brien,Viola Oertel,John Ollinger,Alexander Olsen,Victor Ortiz García de la Foz,Mustafa Ozmen,Heath Pardoe,Marise Parent,Fabrizio Piras,Federica Piras,Edith Pomarol-Clotet,Jonathan Repple,Geneviève Richard,Jonathan Rodriguez,Mabel Rodriguez,Kelly Rootes-Murdy,Jared Rowland,Nicholas P Ryan,Raymond Salvador,Anne-Marthe Sanders,Andre Schmidt,Jair C Soares,Gianfranco Spalleta,Filip Španiel,Alena Stasenko,Frederike Stein,Benjamin Straube,April Thames,Florian Thomas-Odenthal,Sophia I Thomopoulos,Erin Tone,Ivan Torres,Maya Troyanskaya,Jessica A Turner,Kristine M Ulrichsen,Guillermo Umpierrez,Elisabet Vilella,Lucy Vivash,William C Walker,Emilio Werden,Lars T Westlye,Krista Wild,Adrian Wroblewski,Mon-Ju Wu,Glenn R Wylie,Lakshmi N Yatham,Giovana B Zunta-Soares,Paul M Thompson,David F Tate,Frank G Hillary,Emily L Dennis,Elisabeth A Wilde
DOI: https://doi.org/10.1101/2023.01.16.524331
2023-04-07
bioRxiv
Abstract:Investigators in neuroscience have turned to Big Data to address replication and reliability issues by increasing sample sizes, statistical power, and representativeness of data. These efforts unveil new questions about integrating data arising from distinct sources and instruments. We focus on the most frequently assessed cognitive domain - memory testing - and demonstrate a process for reliable data harmonization across three common measures. We aggregated global raw data from 53 studies totaling N = 10,505 individuals. A mega-analysis was conducted using empirical bayes harmonization to remove site effects, followed by linear models adjusting for common covariates. A continuous item response theory (IRT) model estimated each individual's latent verbal learning ability while accounting for item difficulties. Harmonization significantly reduced inter-site variance while preserving covariate effects, and our conversion tool is freely available online. This demonstrates that large-scale data sharing and harmonization initiatives can address reproducibility and integration challenges across the behavioral sciences.
What problem does this paper attempt to address?