Abstract:Background: Health research that significantly impacts global clinical practice and policy is often published in high-impact factor (IF) medical journals. These outlets play a pivotal role in the worldwide dissemination of novel medical knowledge. However, researchers identifying as women and those affiliated with institutions in low- and middle-income countries (LMIC) have been largely underrepresented in high-IF journals across multiple fields of medicine. To evaluate disparities in gender and geographical representation among authors who have published in any of five top general medical journals, we conducted scientometric analyses using a large-scale dataset extracted from the New England Journal of Medicine (NEJM), Journal of the American Medical Association (JAMA), The British Medical Journal (BMJ), The Lancet, and Nature Medicine. Methods: Author metadata from all articles published in the selected journals between 2007 and 2022 were collected using the DimensionsAI platform. The Genderize.io API was then utilized to infer each author's likely gender based on their extracted first name. The World Bank country classification was used to map countries associated with researcher affiliations to the LMIC or the high-income country (HIC) category. We characterized the overall gender and country income category representation across the medical journals. In addition, we computed article-level diversity metrics and contrasted their distributions across the journals. Findings: We studied 151,536 authors across 49,764 articles published in five top medical journals, over a long period spanning 15 years. On average, approximately one-third (33.1%) of the authors of a given paper were inferred to be women; this result was consistent across the journals we studied. Further, 86.6% of the teams were exclusively composed of HIC authors; in contrast, only 3.9% were exclusively composed of LMIC authors. The probability of serving as the first or last author was significantly higher if the author was inferred to be a man (18.1% vs 16.8%, P < .01) or was affiliated with an institution in a HIC (16.9% vs 15.5%, P < .01). Our primary finding reveals that having a diverse team promotes further diversity, within the same dimension (i.e., gender or geography) and across dimensions. Notably, papers with at least one woman among the authors were more likely to also involve at least two LMIC authors (11.7% versus 10.4% in baseline, P < .001; based on inferred gender); conversely, papers with at least one LMIC author were more likely to also involve at least two women (49.4% versus 37.6%, P < .001; based on inferred gender). Conclusion: We provide a scientometric framework to assess authorship diversity. Our research suggests that the inclusiveness of high-impact medical journals is limited in terms of both gender and geography. We advocate for medical journals to adopt policies and practices that promote greater diversity and collaborative research. In addition, our findings offer a first step towards understanding the composition of teams conducting medical research globally and an opportunity for individual authors to reflect on their own collaborative research practices and possibilities to cultivate more diverse partnerships in their work.

Diversity and inclusion: A hidden additional benefit of Open Data

Addressing bias in big data and AI for health care: A call for open science

Availability of information needed to evaluate algorithmic fairness - A systematic review of publicly accessible critical care databases

The value of standards for health datasets in artificial intelligence-based applications

Does diversity beget diversity? A scientometric analysis of over 150,000 studies and 49,000 authors published in high-impact medical journals between 2007 and 2022

A scientometric analysis of fairness in health AI literature

Sources of bias in artificial intelligence that perpetuate healthcare disparities—A global review

Who Owns the Data? Open Data for Healthcare

Revealing transparency gaps in publicly available COVID-19 datasets used for medical artificial intelligence development—a systematic review

Implicit bias in Critical Care Data: Factors affecting sampling frequencies and missingness patterns of clinical and biological variables in ICU Patients

Data and model bias in artificial intelligence for healthcare applications in New Zealand

The Role of the ACR Data Science Institute in Advancing Health Equity in Radiology

Bias and Non-Diversity of Big Data in Artificial Intelligence: Focus on Retinal Diseases

The Privacy and Security Implications of Open Data in Healthcare

Big data and AI for gender equality in health: bias is a big challenge

Embracing diversity and inclusivity in an academic setting: Insights from the Organization for Human Brain Mapping

The impact of commercial health datasets on medical research and health-care algorithms

A guide to sharing open healthcare data under the General Data Protection Regulation

Big Data Analytics and the Struggle for Equity in Health Care: The Promise and Perils

Opportunity and accessibility: an environmental scan of publicly available data repositories to address disparities in healthcare decision-making

Gender disparity in critical care publications: a novel Female First Author Index