An Efficient Federated Genetic Programming Framework for Symbolic Regression
Junlan Dong,Jinghui Zhong,Wei-Neng Chen,Jun Zhang
DOI: https://doi.org/10.1109/tetci.2022.3201299
2023-01-01
IEEE Transactions on Emerging Topics in Computational Intelligence
Abstract:Symbolic regression is an important method of data-driven modeling, which can provide explicit mathematical expressions for data analysis. However, the existing genetic programming algorithms for symbolic regression require centralized storage of all data, which is unrealistic in many practical applications that involve data privacy. If the data comes from different sources, such as hospitals and banks, it is prone to privacy breaches and security issues. To this end, we propose an efficient federated genetic programming framework that can train a global model without integrated data. Each client can process decentralized data locally in parallel, without sending the original data to the server. This method not only protects the privacy of the data but also reduces the time required for data collection. Moreover, a mean shift aggregation mechanism is developed for aggregating local fitness. Considering the samples$^{\prime }$ relative importance, the mechanism improves the imbalance of symbolic regression data on real-life by incorporating weights into fitness function. Furthermore, based on this framework and self-learning gene expression programming (SL-GEP), a federated self-learning gene expression programming algorithm is developed. The experimental results show that, compared with standard SL-GEP which is a training model based on decentralized data only, our proposed federated genetic programming method is effective to protect data privacy and can have consistently better generalization performance.