Abstract:Background: Digital just-in-time adaptive interventions can reduce binge-drinking events (BDEs; consuming ≥4 drinks for women and ≥5 drinks for men per occasion) in young adults but need to be optimized for timing and content. Delivering just-in-time support messages in the hours prior to BDEs could improve intervention impact. Objective: We aimed to determine the feasibility of developing a machine learning (ML) model to accurately predict future , that is, same-day BDEs 1 to 6 hours prior BDEs , using smartphone sensor data and to identify the most informative phone sensor features associated with BDEs on weekends and weekdays to determine the key features that explain prediction model performance. Methods: We collected phone sensor data from 75 young adults (aged 21 to 25 years; mean 22.4, SD 1.9 years) with risky drinking behavior who reported their drinking behavior over 14 weeks. The participants in this secondary analysis were enrolled in a clinical trial. We developed ML models testing different algorithms (eg, extreme gradient boosting [XGBoost] and decision tree) to predict same-day BDEs (vs low-risk drinking events and non-drinking periods) using smartphone sensor data (eg, accelerometer and GPS). We tested various "prediction distance" time windows (more proximal: 1 hour; distant: 6 hours) from drinking onset. We also tested various analysis time windows (ie, the amount of data to be analyzed), ranging from 1 to 12 hours prior to drinking onset, because this determines the amount of data that needs to be stored on the phone to compute the model. Explainable artificial intelligence was used to explore interactions among the most informative phone sensor features contributing to the prediction of BDEs. Results: The XGBoost model performed the best in predicting imminent same-day BDEs, with 95% accuracy on weekends and 94.3% accuracy on weekdays ( F 1 -score=0.95 and 0.94, respectively). This XGBoost model needed 12 and 9 hours of phone sensor data at 3- and 6-hour prediction distance from the onset of drinking on weekends and weekdays, respectively, prior to predicting same-day BDEs. The most informative phone sensor features for BDE prediction were time (eg, time of day) and GPS-derived features, such as the radius of gyration (an indicator of travel). Interactions among key features (eg, time of day and GPS-derived features) contributed to the prediction of same-day BDEs. Conclusions: We demonstrated the feasibility and potential use of smartphone sensor data and ML for accurately predicting imminent (same-day) BDEs in young adults. The prediction model provides "windows of opportunity," and with the adoption of explainable artificial intelligence, we identified "key contributing features" to trigger just-in-time adaptive intervention prior to the onset of BDEs, which has the potential to reduce the likelihood of BDEs in young adults. Trial Registration: ClinicalTrials.gov NCT02918565; https://clinicaltrials.gov/ct2/show/NCT02918565

Machine Learning Approaches to Predict Alcohol Consumption from Biomarkers in the UK Biobank

Machine learning prediction of blood alcohol concentration: a digital signature of smart-breathalyzer behavior

Leveraging Mobile Phone Sensors, Machine Learning, and Explainable Artificial Intelligence to Predict Imminent Same-Day Binge-drinking Events to Support Just-in-time Adaptive Interventions: Algorithm Development and Validation Study

A machine learning model for the prediction of unhealthy alcohol use among women of childbearing age in Alabama

Machine‐learning prediction of adolescent alcohol use: a cross‐study, cross‐cultural validation

Application of Machine Learning Techniques to the Prediction of Onset and Persistence of Binge Eating: A Prospective Study

Predicting Alcohol Consumption Patterns for Individuals with a User-Friendly Parsimonious Statistical Model

Person-specific and pooled prediction models for binge eating, alcohol use and binge drinking in bulimia nervosa and alcohol use disorder

Improving Cardiovascular Risk Prediction Through Machine Learning Modelling of Irregularly Repeated Electronic Health Records

A physiologically-based digital twin for alcohol consumption—predicting real-life drinking responses and long-term plasma PEth

Epigenetic and Proteomic Biomarkers of Elevated Alcohol Use Predict Epigenetic Aging and Cell-Type variation Better Than Self-Report

A physiologically-based digital twin for alcohol consumption – predicting real-life drinking responses and long-term plasma PEth

Machine learning across multiple imaging and biomarker modalities in the UK Biobank improves genetic discovery for liver fat accumulation

Identification of integrated proteomics and transcriptomics signature of alcohol-associated liver disease using machine learning

Prediction of atrial fibrillation and stroke using machine learning models in UK Biobank

Blood-based DNA methylation study of alcohol consumption

Evaluating the performance of personal, social, health-related, biomarker and genetic data for predicting an individuals future health using machine learning: A longitudinal analysis

Patterns of high-risk drinking among medical students: A web-based survey with machine learning

Machine Learning Prediction of Biomarkers from SNPs and of Disease Risk from Biomarkers in the UK Biobank

Enhancing Selection of Alcohol Consumption Associated Genes by Random Forest

Leveraging Genetic Data for Predicting Consumer Choices of Alcoholic Products