CLASSify: A Web-Based Tool for Machine Learning

Aaron D. Mullen,Samuel E. Armstrong,Jeff Talbert,V.K. Cody Bumgardner
2023-10-05
Abstract:Machine learning classification problems are widespread in bioinformatics, but the technical knowledge required to perform model training, optimization, and inference can prevent researchers from utilizing this technology. This article presents an automated tool for machine learning classification problems to simplify the process of training models and producing results while providing informative visualizations and insights into the data. This tool supports both binary and multiclass classification problems, and it provides access to a variety of models and methods. Synthetic data can be generated within the interface to fill missing values, balance class labels, or generate entirely new datasets. It also provides support for feature evaluation and generates explainability scores to indicate which features influence the output the most. We present CLASSify, an open-source tool for simplifying the user experience of solving classification problems without the need for knowledge of machine learning.
Machine Learning,Distributed, Parallel, and Cluster Computing,Human-Computer Interaction,Software Engineering
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to simplify the process of machine - learning classification tasks, especially for researchers and medical professionals without a strong technical background. Specifically, the paper introduces an automated tool named CLASSify, which aims to: 1. **Simplify model training**: By providing an easy - to - use interface, non - professional users can also easily perform model training, optimization, and inference without having to have in - depth knowledge of the technical details of machine learning. 2. **Support multiple classification tasks**: Support binary - classification and multi - classification problems, and provide multiple machine - learning models and methods for users to choose from. 3. **Generate synthetic data**: Provide the function of generating synthetic data to fill in missing values, balance class labels, or generate completely new data sets, especially in the medical field where real data may be protected or unbalanced. 4. **Feature evaluation and explanation**: Through Shapley Additive Explanations (SHAP) scores, help users understand which features have the greatest impact on model predictions, thereby improving the interpretability of the model. 5. **Visualize results**: Generate multiple visual charts to help users understand model performance and the importance of features more intuitively. Through these functions, CLASSify aims to lower the threshold for using machine - learning techniques, enabling more researchers and medical professionals to use this powerful tool to solve practical problems.