EndToEndML: An Open-Source End-to-End Pipeline for Machine Learning Applications

Nisha Pillai,Athish Ram Das,Moses Ayoola,Ganga Gireesan,Bindu Nanduri,Mahalingam Ramkumar
2024-03-27
Abstract:Artificial intelligence (AI) techniques are widely applied in the life sciences. However, applying innovative AI techniques to understand and deconvolute biological complexity is hindered by the learning curve for life science scientists to understand and use computing languages. An open-source, user-friendly interface for AI models, that does not require programming skills to analyze complex biological data will be extremely valuable to the bioinformatics community. With easy access to different sequencing technologies and increased interest in different 'omics' studies, the number of biological datasets being generated has increased and analyzing these high-throughput datasets is computationally demanding. The majority of AI libraries today require advanced programming skills as well as machine learning, data preprocessing, and visualization skills. In this research, we propose a web-based end-to-end pipeline that is capable of preprocessing, training, evaluating, and visualizing machine learning (ML) models without manual intervention or coding expertise. By integrating traditional machine learning and deep neural network models with visualizations, our library assists in recognizing, classifying, clustering, and predicting a wide range of multi-modal, multi-sensor datasets, including images, languages, and one-dimensional numerical data, for drug discovery, pathogen classification, and medical diagnostics.
Artificial Intelligence
What problem does this paper attempt to address?
The paper aims to address the demand in the biomedical research community for more user-friendly and interpretable artificial intelligence (AI) tools to extract insights from the increasingly large and complex biological data. Currently, applying machine learning to multidimensional biological data requires specialized programming skills, limiting adoption by researchers outside of the non-professional bioinformaticians and computational biologists. Therefore, this paper proposes a web-based end-to-end pipeline that can automate the preprocessing, training, evaluation, and visualization of machine learning models without manual intervention or coding expertise. This open-source platform is designed to help biologists without programming backgrounds leverage the predictive and pattern recognition capabilities of AI to accelerate biomedical research, particularly in areas such as drug discovery, pathogen classification, and medical diagnosis. The paper introduces an open-source tool called EndToEndML, which simplifies the machine learning workflow, including data preprocessing, model training, performance evaluation, and interactive visualization, through an intuitive graphical interface. By integrating traditional machine learning and deep neural network models, the library can handle various modalities of data, such as images, language, and one-dimensional numerical data. Additionally, it provides user-friendly visualization capabilities to enhance users' understanding of model detection patterns and relationships. Compared to existing machine learning libraries such as Weka, Orange3, Scikit-learn, TensorFlow, and PyTorch, EndToEndML focuses more on usability and reducing the learning curve, making it especially suitable for life scientists without programming or database knowledge. The paper outlines related work and provides a detailed description of EndToEndML's architecture, supported features, and two use cases. Its goal is to facilitate the application of AI to complex multimodal data for advancements in biomedical research by simplifying the machine learning process for users with varying skill levels.