Sequential Model-Based Optimization for Natural Language Processing Data Pipeline Selection and Optimization

Piyadanai Arntong,Worapol Alex Pongpech
DOI: https://doi.org/10.1007/978-3-030-73280-6_24
2021-01-01
Abstract:Natural language processing (NLP) aims to analyze a large amount of natural language data. The NLP computes textual data via a set of data processing elements which is sequentially connected to a path data pipeline. Several data pipelines exist for a given set of textual data with various degrees of model accuracy. Instead of trying all the possible paths, such as random search or grid search to find an optimal path, we utilized the Bayesian optimization to search along with the space of hyper-parameters learning. In this study, we proposed a data pipeline selection for NLP using Sequential Model-based Optimization (SMBO). We implemented the SMBO for the NLP data pipeline using Hyperparameter Optimization (Hyperopt) library with Tree of Parzen Estimators (TPE) model and Adaptive Tree of Parzen Estimators (A-TPE) model for a surface model with expected improvement (EI) acquired function.
What problem does this paper attempt to address?