Abstract:This paper conducts an in-depth analysis and research on the automatic selection and parameter configuration of the core components of Big Data software by using the retention model and the automatic selection of Big Data components by establishing a standardized requirement index and using the decision tree model to solve the problem of component selection in Big Data application development. By establishing standardized demand indicators and based on the retention model, a data transmission intermediate platform for bidirectional data detection is proposed based on the three demands of user input: storage, computation, and analysis, as well as the problem of undetectable packet loss in data transmission of existing IoT and Web service platforms. The data communication module of the data transmission intermediate platform enables mutual monitoring and detection of data interaction between IoT smart terminals and cloud platforms. The retention mode is built separately to realize the automatic selection of Big Data components. In this paper, we start from several mainstream distributed storage systems and use Cassandra as an example for experiments and tests. We use the multiple regression fitting method to build a corresponding performance model for hardware parameters, take user requirements as input, and use the performance model to configure system hardware parameters; by studying its system principle, architecture, features, and application scenarios, we build a software parameter configuration knowledge base to guide the software. This solves the difficult problem of selecting, deploying, and configuring parameters for Big Data applications.

A Survey on Automatic Parameter Tuning for Big Data Processing Systems

BestConfig: Tapping the Performance Potential of Systems Via Automatic Configuration Tuning

Automatic Configuration Tuning on Cloud Database: A Survey

Learning-based Automatic Parameter Tuning for Big Data Analytics Frameworks

Design and Parameter Tuning of Multivariable Model Predictive Controller

Facilitating Database Tuning with Hyper-Parameter Optimization

Facilitating Database Tuning with Hyper-Parameter Optimization: a Comprehensive Experimental Evaluation

On combining system and machine learning performance tuning for distributed data stream applications

Parallel computing based parameter auto-tuning algorithm for optimization solvers

Autonomic Architecture for Big Data Performance Optimization

Towards General and Efficient Online Tuning for Spark

An Adaptive Auto-configuration Tool for Hadoop

Random sampling-based automatic parameter tuning for nonlinear programming solvers

Parameter Tuning for Self-optimizing Software at Scale

Automated Algorithm Configuration and Parameter Tuning

Data-mechanism-driven Product Performance Optimization with Multiple Parameters under Uncertainties in Manufacturing Automation Systems

An automatic and effective parameter optimization method for model tuning

Automatic Selection and Parameter Configuration of Big Data Software Core Components Based on Retention Pattern

A Survey on Spark Ecosystem for Big Data Processing

Hyper-Tune: Towards Efficient Hyper-parameter Tuning at Scale

Automatic Database Index Tuning: A Survey