Evolving Gaussian Process kernels from elementary mathematical expressions

Ibai Roman,Roberto Santana,Alexander Mendiburu,Jose A. Lozano
DOI: https://doi.org/10.48550/arXiv.1910.05173
2019-10-14
Abstract:Choosing the most adequate kernel is crucial in many Machine Learning applications. Gaussian Process is a state-of-the-art technique for regression and classification that heavily relies on a kernel function. However, in the Gaussian Process literature, kernels have usually been either ad hoc designed, selected from a predefined set, or searched for in a space of compositions of kernels which have been defined a priori. In this paper, we propose a Genetic-Programming algorithm that represents a kernel function as a tree of elementary mathematical expressions. By means of this representation, a wider set of kernels can be modeled, where potentially better solutions can be found, although new challenges also arise. The proposed algorithm is able to overcome these difficulties and find kernels that accurately model the characteristics of the data. This method has been tested in several real-world time-series extrapolation problems, improving the state-of-the-art results while reducing the complexity of the kernels.
Machine Learning
What problem does this paper attempt to address?
### Problems Addressed by the Paper The paper aims to address the issue of kernel function selection in Gaussian Processes (GP). Specifically, the authors propose a method based on Genetic Programming (GP) to evolve suitable kernel functions from basic mathematical expressions. The main objectives of this method are: 1. **Expanding the Search Space of Kernel Functions**: Traditional methods for selecting kernel functions often rely on predefined sets of kernel functions or generate new kernel functions by combining known ones. These methods limit the possible types of kernel functions, potentially leading to suboptimal solutions. The proposed method uses basic mathematical expressions as building blocks, enabling the generation of a wider and more complex range of kernel functions. 2. **Improving Model Accuracy**: By expanding the search space of kernel functions, this method can find kernel functions that more accurately describe the characteristics of the data in practical applications, thereby improving the performance of Gaussian Processes in regression and classification tasks. 3. **Reducing Kernel Function Complexity**: Although the generated kernel functions can be more complex, the method also aims to reduce the complexity of the kernel functions by optimizing and controlling their depth, making them more feasible for practical applications. 4. **Automating Kernel Function Selection**: This method reduces the workload of manually designing and selecting kernel functions through an automated search process, making the application of Gaussian Processes more convenient and efficient. ### Main Contributions - **Proposed a New Genetic Programming Algorithm (EvoCov)**: This algorithm can evolve suitable kernel functions from basic mathematical expressions and overcome the challenge of generating non-positive definite kernel functions. - **Extended the Representational Capacity of Kernel Functions**: By using basic mathematical expressions, this method can generate a wider range of kernel functions, enhancing the representational capacity of Gaussian Processes. - **Validation on Multiple Real-World Problems**: The method was tested on multiple time series extrapolation problems, showing superior performance compared to existing methods and generating kernel functions with lower complexity. ### Method Overview - **Kernel Function Representation**: Kernel functions are represented as tree structures of basic mathematical expressions, with random kernel functions generated recursively. - **Mutation Operations**: Includes crossover and mutation operations to generate new kernel functions. Crossover operations create new kernel functions by combining subtrees of two kernel functions, while mutation operations modify kernel functions through insertion, shrinkage, uniform replacement, and node replacement. - **Depth Control**: By setting a maximum depth limit, the method avoids generating overly complex kernel functions. - **Evaluation Method**: Uses the Bayesian Information Criterion (BIC) as the fitness function, evaluating the performance of each kernel function by optimizing hyperparameters. ### Conclusion The proposed method demonstrates good performance on multiple real-world problems, not only improving the accuracy of Gaussian Processes but also reducing the complexity of kernel functions. It provides a new approach for automated kernel function selection.