Rodrigo Rivera-Castro,Aleksandr Pletnev,Polina Pilyugina,Grecia Diaz,Ivan Nazarov,Wanyi Zhu,Evgeny Burnaev
Abstract:Topological Data Analysis (TDA) is a recent approach to analyze data sets from the perspective of their topological structure. Its use for time series data has been limited. In this work, a system developed for a leading provider of cloud computing combining both user segmentation and demand forecasting is presented. It consists of a TDA-based clustering method for time series inspired by a popular managerial framework for customer segmentation and extended to the case of clusterwise regression using matrix factorization methods to forecast demand. Increasing customer loyalty and producing accurate forecasts remain active topics of discussion both for researchers and managers. Using a public and a novel proprietary data set of commercial data, this research shows that the proposed system enables analysts to both cluster their user base and plan demand at a granular level with significantly higher accuracy than a state of the art baseline. This work thus seeks to introduce TDA-based clustering of time series and clusterwise regression with matrix factorization methods as viable tools for the practitioner.
What problem does this paper attempt to address?
The problems that this paper attempts to solve are: How to provide a leading cloud computing provider with deeper customer insights and improve the accuracy of demand forecasting by combining user segmentation and demand forecasting. Specifically, the paper aims to solve the following problems:
1. **Combination of customer segmentation and demand forecasting**: Although traditional customer segmentation methods (such as the RFM model) can effectively classify customers, it is difficult to provide accurate demand forecasts. The paper proposes a time - series clustering method based on Topological Data Analysis (TDA), combined with matrix decomposition techniques for clusterwise regression, in order to achieve more refined user segmentation and more accurate demand forecasting.
2. **Challenges in handling time - series data**: The historical data of cloud computing services is limited, seasonality is difficult to detect, and historical records are often not representative, which makes traditional forecasting techniques ineffective. The TDA method proposed in the paper can better capture the complex structures in time - series data, thereby improving the forecasting performance.
3. **Enhancing the support ability for business decisions**: By introducing advanced machine - learning methods, the paper hopes to provide business analysts with a tool that is both easy to understand and efficient, helping them make more informed decisions in marketing and demand planning.
### Specific problem statements
The specific problem statements in the paper are as follows:
- **Insufficient understanding of customer needs**: Cloud computing providers need to better understand their customer base in order to provide customized promotional activities and more accurately assess future demand for their services.
- **Limitations of traditional methods**: Due to the novelty and flexibility of cloud computing products, historical data is limited, seasonality is difficult to identify, and historical records are usually not representative. Therefore, traditional forecasting techniques perform poorly on these data.
- **Misleading of customer segmentation methods**: The commonly used RFM framework may lead to two customers with the same score being actually very different, such as the situation shown in Figure 1.
### Solution overview
To address the above challenges, the paper proposes a system that combines TDA and matrix decomposition, specifically including the following aspects:
- **TDA - based Clustering**: Use topological data analysis to cluster time - series data and capture the complex structures in the data.
- **Clusterwise Regression**: Combine matrix decomposition techniques to perform linear regression within each cluster to achieve more accurate demand forecasting.
- **Clustering Ensemble Methods**: Introduce two new clustering ensemble methods (GMM Voting and GMM Pair) to improve the robustness and quality of clustering results.
- **Topological RFM**: Extend the traditional RFM framework to generate three time - series (Recency, Frequency, Monetary) and use them for TDA analysis.
Through these methods, the paper shows how to achieve more effective user segmentation and more accurate demand forecasting in commercial datasets, thereby providing better business support for cloud computing providers.