Deep Fast Machine Learning Utils: A Python Library for Streamlined Machine Learning Prototyping

Fabi Prezja
2024-09-15
Abstract:Machine learning (ML) research and application often involve time-consuming steps such as model architecture prototyping, feature selection, and dataset preparation. To support these tasks, we introduce the Deep Fast Machine Learning Utils (DFMLU) library, which provides tools designed to automate and enhance aspects of these processes. Compatible with frameworks like TensorFlow, Keras, and Scikit-learn, DFMLU offers functionalities that support model development and data handling. The library includes methods for dense neural network search, advanced feature selection, and utilities for data management and visualization of training outcomes. This manuscript presents an overview of DFMLU's functionalities, providing Python examples for each tool.
Machine Learning
What problem does this paper attempt to address?
This paper attempts to solve some time - consuming tasks in machine learning (ML) research and applications, such as model architecture prototyping, feature selection, and dataset preparation. Specifically, these problems include: 1. **Model Architecture Prototyping**: As the complexity of model architectures and the scale of datasets increase, manually designing and adjusting neural network structures becomes increasingly time - consuming and difficult. 2. **Feature Selection**: Selecting features that are most helpful to model performance from a large number of features is a complex and important task, especially in the case of high - dimensional data. 3. **Dataset Preparation**: It includes steps such as data splitting, subsampling, and visualization, and these tasks are especially cumbersome when dealing with large - scale datasets. To solve the above problems, the paper introduces a Python library named **Deep Fast Machine Learning Utils (DFMLU)**. DFMLU provides a series of tools to automate and enhance these processes, thereby improving the efficiency of the machine - learning workflow. Specific functions include: - **Dense Neural Network Search**: Automatically design dense neural networks through the Principal Component Cascade Dense Neural Architecture Search (PCCDNAS) method. - **Advanced Feature Selection**: Provides multiple feature selection methods, such as Adaptive Variance Threshold (AVT), Rank Aggregated Feature Selection (RAFS), and Chained Feature Selection (ChainedFS). - **Data Management Tools**: Includes a Dataset Splitter and a Data Sub Sampler, which are used to simplify dataset splitting and subsampling. - **Training Result Visualization**: Provides the functions of plotting validation curves (plot_history_curves) and generating confusion matrices (plot_confusion_matrix) to help diagnose model performance. Through these tools, DFMLU aims to simplify and accelerate the development and optimization process of machine - learning models, enabling researchers and developers to conduct experiments and debugging more efficiently.