EndToEndML: An Open-Source End-to-End Pipeline for Machine Learning Applications

Nisha Pillai,Athish Ram Das,Moses Ayoola,Ganga Gireesan,Bindu Nanduri,Mahalingam Ramkumar

2024-03-27

Abstract:Artificial intelligence (AI) techniques are widely applied in the life sciences. However, applying innovative AI techniques to understand and deconvolute biological complexity is hindered by the learning curve for life science scientists to understand and use computing languages. An open-source, user-friendly interface for AI models, that does not require programming skills to analyze complex biological data will be extremely valuable to the bioinformatics community. With easy access to different sequencing technologies and increased interest in different 'omics' studies, the number of biological datasets being generated has increased and analyzing these high-throughput datasets is computationally demanding. The majority of AI libraries today require advanced programming skills as well as machine learning, data preprocessing, and visualization skills. In this research, we propose a web-based end-to-end pipeline that is capable of preprocessing, training, evaluating, and visualizing machine learning (ML) models without manual intervention or coding expertise. By integrating traditional machine learning and deep neural network models with visualizations, our library assists in recognizing, classifying, clustering, and predicting a wide range of multi-modal, multi-sensor datasets, including images, languages, and one-dimensional numerical data, for drug discovery, pathogen classification, and medical diagnostics.

Artificial Intelligence

What problem does this paper attempt to address?

The paper aims to address the demand in the biomedical research community for more user-friendly and interpretable artificial intelligence (AI) tools to extract insights from the increasingly large and complex biological data. Currently, applying machine learning to multidimensional biological data requires specialized programming skills, limiting adoption by researchers outside of the non-professional bioinformaticians and computational biologists. Therefore, this paper proposes a web-based end-to-end pipeline that can automate the preprocessing, training, evaluation, and visualization of machine learning models without manual intervention or coding expertise. This open-source platform is designed to help biologists without programming backgrounds leverage the predictive and pattern recognition capabilities of AI to accelerate biomedical research, particularly in areas such as drug discovery, pathogen classification, and medical diagnosis. The paper introduces an open-source tool called EndToEndML, which simplifies the machine learning workflow, including data preprocessing, model training, performance evaluation, and interactive visualization, through an intuitive graphical interface. By integrating traditional machine learning and deep neural network models, the library can handle various modalities of data, such as images, language, and one-dimensional numerical data. Additionally, it provides user-friendly visualization capabilities to enhance users' understanding of model detection patterns and relationships. Compared to existing machine learning libraries such as Weka, Orange3, Scikit-learn, TensorFlow, and PyTorch, EndToEndML focuses more on usability and reducing the learning curve, making it especially suitable for life scientists without programming or database knowledge. The paper outlines related work and provides a detailed description of EndToEndML's architecture, supported features, and two use cases. Its goal is to facilitate the application of AI to complex multimodal data for advancements in biomedical research by simplifying the machine learning process for users with varying skill levels.

EndToEndML: An Open-Source End-to-End Pipeline for Machine Learning Applications

Understanding Biology in the Age of Artificial Intelligence

Artificial Intelligence, Physiological Genomics, and Precision Medicine.

EHR-ML: A generalisable pipeline for reproducible clinical outcomes using electronic health records

Advances in AI and machine learning for predictive medicine

Machine Learning Made Easy (MLme): a comprehensive toolkit for machine learning–driven data analysis

Interpretable Machine Learning for Genomics

Potential Role of Machine Learning in Oncology

AMPL: A Data-Driven Modeling Pipeline for Drug Discovery

BioAutoML: automated feature engineering and metalearning to predict noncoding RNAs in bacteria

BioMANIA: Simplifying bioinformatics data analysis through conversation

Democratizing Artificial Intelligence Imaging Analysis With Automated Machine Learning: Tutorial

Artificial Intelligence and Machine Learning Technology Driven Modern Drug Discovery and Development

AutoXAI4Omics: an Automated Explainable AI tool for Omics and tabular data

Selective transformations of organic compounds by imidozirconocene complexes.

Computational Biology and Chemistry with AI and ML

Machine learning for catalysing the integration of noncoding RNA in research and clinical practice

Pharm‐AutoML: An open‐source, end‐to‐end automated machine learning package for clinical outcome prediction

Opportunities and obstacles for deep learning in biology and medicine

Deep learning tools for advancing drug discovery and development

AI in biomedical research: unleashing the potential of a transformative partnership