Abstract:Background Health science findings are primarily disseminated through manuscript publications. Information subsidies are used to communicate newsworthy findings to journalists in an effort to earn mass media coverage and further disseminate health science research to mass audiences. Journal editors and news journalists then select which news stories receive coverage and thus public attention. Objective This study aims to identify attributes of published health science articles that correlate with (1) journal editor issuance of press releases and (2) mainstream media coverage. Methods We constructed four novel datasets to identify factors that correlate with press release issuance and media coverage. These corpora include thousands of published articles, subsets of which received press release or mainstream media coverage. We used statistical machine learning methods to identify correlations between words in the science abstracts and press release issuance and media coverage. Further, we used a topic modeling-based machine learning approach to uncover latent topics predictive of the perceived newsworthiness of science articles. Results Both press release issuance for, and media coverage of, health science articles are predictable from corresponding journal article content. For the former task, we achieved average areas under the curve (AUCs) of 0.666 (SD 0.019) and 0.882 (SD 0.018) on two separate datasets, comprising 3024 and 10,760 articles, respectively. For the latter task, models realized mean AUCs of 0.591 (SD 0.044) and 0.783 (SD 0.022) on two datasets—in this case containing 422 and 28,910 pairs, respectively. We reported most-predictive words and topics for press release or news coverage. Conclusions We have presented a novel data-driven characterization of content that renders health science “newsworthy.” The analysis provides new insights into the news coverage selection process. For example, it appears epidemiological papers concerning common behaviors (eg, alcohol consumption) tend to receive media attention.

Understanding Fine-grained Distortions in Reports of Scientific Findings

Can Large Language Models Detect Misinformation in Scientific News Reporting?

Quantifying Data Distortion in Bar Graphs in Biological Research

Scientific research in news media: a case study of misrepresentation, sensationalism and harmful recommendations

From impact metrics and open science to communicating research: Journalists' awareness of academic controversies

Hidden: A Baker's Dozen Ways in Which Research Reporting is Less Transparent than it Could be and Suggestions for Implementing Einstein's Dictum

The (im-)moral scientist? Measurement and framing effects shape the association between scientists and immorality

ManiTweet: A New Benchmark for Identifying Manipulation of News on Social Media

Characterizing the (perceived) Newsworthiness of Health Science Articles: A Data-Driven Approach

FakeNewsLab: Experimental Study on Biases and Pitfalls Preventing us from Distinguishing True from False News

Enable people to identify science news based on retracted articles on social media

Facilitating Human-LLM Collaboration through Factuality Scores and Source Attributions

Fine-Tuning Language Models for Scientific Writing Support

Towards an understanding and explanation for mixed-initiative artificial scientific text detection

Modeling Information Change in Science Communication with Semantically Matched Paraphrases

'Don't Get Too Technical with Me': A Discourse Structure-Based Framework for Science Journalism

Expressions of uncertainty in online science communication hinder information diffusion

Reporting Gaps Between News Media and Scientific Papers on Outdoor Air Pollution-Related Health Outcomes: A Content Analysis.

Predicting Factuality of Reporting and Bias of News Media Sources

Science in the News: A Study of Reporting Genomics

How the public evaluates media representations of uncertain science: An integrated explanatory framework