Abstract:Microbiomes play a crucial role in various biological processes, ranging from human and animal health to the functioning soil and marine ecosystems that support food production and biodiversity. Understanding how perturbations of these communities can impact their respective environments is essential for making new scientific discoveries and developing practical solutions to improve both human well-being and the health of our planet. However, encapsulating the sheer diversity of microbial communities and the intricate web of interactions they establish with other organisms results in vast and complex datasets. Traditional statistical methods often fall short in capturing both the nuances and global summary of these interactions. With its ability to process large datasets and identify intricate patterns, machine learning (ML) provides a powerful solution. Techniques such as neural networks and ensemble learning models are particularly wellsuited for this task, enabling researchers to make sense of the multi-layered structures inherent in microbiome data. Nevertheless, the integration of ML in microbiome research has challenges, including input data standardization, heterogenous, noisy and high-dimensional data as well as interpretability of ML models. Addressing these challenges requires a concerted effort from biologists, data scientists, and computational experts, fostering a collaborative environment where knowledge and techniques can be shared and refined. This is a exactly what we carried out as part of the COST Action ML4Microbiome (CA18131), which is best summarise by publications in the "Microbiome and Machine Learning" volumes in Frontiers in Microbiology. This second volume represents a significant step forward in harnessing the power of artificial intelligence to decode the complex world of microbiomes.ML4Microbiome key achievements are summarised in D'Elia et al. In this article, the authors also underscore the importance of ethical considerations when deploying machine learning in microbiome research. Ensuring data privacy, avoiding biases in algorithmic predictions, and promoting transparency in model development are essential to maintaining public trust and maximizing the societal benefits of these technologies. Papoutsoglou et al. subsequently detailed the technical complexity of applying ML for microbiome research. The review identifies and addresses challenges such as preprocessing, feature selection, predictive modeling, performance estimation, and model interpretation, finally providing a set of recommendations on algorithm selection, pipeline creation, and evaluation to aid in decision-making processes related to microbiome research. An in-depth exploration of data preprocessing methods is provided by Ibrahimi et al. This paper aims to guide both established researchers and those new to the field in selecting appropriate transformation methods based on their research questions, objectives, and data characteristics.To provide researchers with insights into specific ML resources facilitating microbiome analysis, Marcos-Zambrano et al. categorized ML tools based on the type of analysis they are designed for and the ML algorithms they employ. The focus spans various software tools for feature generation, taxonomic assignment, clustering, binning, and disease classification.Kumar et al. emphasize the crucial role of metadata in interpreting and comparing microbiome datasets and highlight the need for standardized metadata protocols to fully leverage the potential of metagenomic data. In this paper microbiome data are classified into five types based on the methodology used for their production: shotgun sequencing, amplicon sequencing, metatranscriptomic sequencing, metabolomic measurements, and metaproteomic expression analysis. The significance of metadata in data interpretation and comparison and the challenges in collecting standardized metadata are thoroughly explored.In the clinical domain, Chang et al. investigated the diagnostic classification and predictive power of four different ML methods for diagnostic screening in myasthenia gravis (MG) using gut microbiome data. The proposed ML model may serve as biomarkers for clinical use and can be applied for non-invasive screening of MG. Zhang et al. present a study that provides valuable insights into the potential impact of gut microbiota on carcinoid syndrome (CS). The paper investigates the cause-and-effect relationship between gut microbiota abundance and carcinoid syndrome (CS) through a bidirectional Mendelian randomization study. Murovec et al. present a study aimed to compare microbiome profiles of patients with colorectal cancer (CRC) and colorectal adenomas (CRA) to healthy participants using metagenomic data. The methodology involved extensive analysis using the MetaBakery pipeline, integrating data matrices like microbial taxonomy, functional genes, enzymatic reactions, metaboli -Abstract Truncated-

Multi-class boosting for the analysis of multiple incomplete views on microbiome data

Machine learning approaches in microbiome research: challenges and best practices

Microbiome-based disease prediction with multimodal variational information bottlenecks

Towards multi-label classification: Next step of machine learning for microbiome research

MK-BMC: a Multi-Kernel framework with Boosted distance metrics for Microbiome data for Classification

Multimodal deep learning applied to classify healthy and disease states of human microbiome

Multimodal weakly supervised learning to identify disease-specific changes in single-cell atlases

P070 Machine learning approaches to identify IBD biomarkers from longitudinal microbiome data

Editorial: Microbiome and machine learning, volume II

Advancing microbiome research with machine learning: key findings from the ML4Microbiome COST action

Applications of Machine Learning in Human Microbiome Studies: A Review on Feature Selection, Biomarker Identification, Disease Prediction and Treatment

Toward Multilabel Classification for Multiple Disease Prediction Using Gut Microbiota Profiles

Machine learning for data integration in human gut microbiome

A comprehensive overview of microbiome data in the light of machine learning applications: categorization, accessibility, and future directions

Establishment and evaluation of prediction model for multiple disease classification based on gut microbial data

MVKTrans: Multi-View Knowledge Transfer for Robust Multiomics Classification

Sparse distance-based learning for simultaneous multiclass classification and feature selection of metagenomic data

Application of data engineering approaches to address challenges in microbiome data for optimal medical decision-making

A toolbox of machine learning software to support microbiome analysis

Longitudinal Microbiome-based Interpretable Machine Learning for Identification of Time-Varying Biomarkers in Early Prediction of Disease Outcomes

Using machine learning approaches for multi-omics data analysis: A review