Learning with Changing Features

Amit Dhurandhar,Steve Hanneke,Liu Yang
DOI: https://doi.org/10.48550/arXiv.1705.00219
2017-04-30
Abstract:In this paper we study the setting where features are added or change interpretation over time, which has applications in multiple domains such as retail, manufacturing, finance. In particular, we propose an approach to provably determine the time instant from which the new/changed features start becoming relevant with respect to an output variable in an agnostic (supervised) learning setting. We also suggest an efficient version of our approach which has the same asymptotic performance. Moreover, our theory also applies when we have more than one such change point. Independent post analysis of a change point identified by our method for a large retailer revealed that it corresponded in time with certain unflattering news stories about a brand that resulted in the change in customer behavior. We also applied our method to data from an advanced manufacturing plant identifying the time instant from which downstream features became relevant. To the best of our knowledge this is the first work that formally studies change point detection in a distribution independent agnostic setting, where the change point is based on the changing relationship between input and output.
Machine Learning,Computation
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to determine the time point at which these new or changed features start to have a significant impact on the output variable when data features change or are added over time. Specifically, the authors propose a method that can determine, without assumptions, the time point at which new or changed features become relevant in a supervised learning setting. In addition, they also propose a more efficient version of the method with the same asymptotic performance, and their theory also applies to cases where there are multiple change points. ### Background and Motivation In many practical application areas, such as retail, manufacturing, and finance, features in a data set may change or be added over time. For example, in the manufacturing process, new measurement tools or old tools that are re - introduced into the production line after maintenance will result in new or changed measurement values. These changes may affect the quality of the product, so it is very important to determine when these changes start to have a significant impact on the product quality. This not only helps manufacturers take preventive or corrective measures, but also can improve overall production efficiency and profitability. ### Solution The authors propose an algorithm called "Search - and - Split" (SaS), which determines the time point of feature changes by minimizing empirical risk. Specifically, they define two function classes \(H_1\) and \(H_2\) to describe the data relationships before and after the change respectively, and determine the optimal time point \(t^*\) by minimizing the following objective function: \[R^*(h_1, h_2, t_0)=\frac{1}{m}\left(\sum_{t = 1}^{t_0-1}(h_1(x_t)-\eta_t)^2+\sum_{t = t_0}^{m}(h_2(x_t)-\eta_t)^2\right)\] where \(\eta_t=\mathbb{E}[Y_t]\) is the expected value of the output variable. By minimizing \(R^*\), the best time point \(t^*\) can be found such that new or changed features start to have a significant impact on the output variable. ### Theoretical Analysis The authors provide distribution - independent excess - risk guarantees and prove that their method is still effective in the case of feature changes. Specifically, they prove the following theorem: **Theorem 1**: With probability at least \(1-\delta\), we have \[R^*(\hat{h}_1,\hat{h}_2,\hat{t})\leq R^*(h_1^*,h_2^*,t^*)+\frac{22B\sqrt{2\ln\left(\frac{2(m + 1)}{\delta}\right)+\sum_{j = 1}^23p_j\ln\left(\frac{emB}{p_j}\right)}}{m}\] where \(\hat{h}_1\) and \(\hat{h}_2\) are the estimated functions obtained by minimizing the empirical risk \(\hat{R}\), \(\hat{t}\) is the estimated time point, \(h_1^*\) and \(h_2^*\) are the optimal functions, \(t^*\) is the optimal time point, \(p_1\) and \(p_2\) are the pseudo - dimensions of \(H_1\) and \(H_2\) respectively, and \(B\) is the range of the output variable. ### Experimental Verification The authors conducted experiments on synthetic data and two real - world industrial data sets to verify the effectiveness of their method. The experimental results show that the SaS method performs best in adapting to feature changes, while SaSF (the efficient version of SaS), although slightly inferior, is still able to adapt to changes quickly and is significantly more computationally efficient than the original SaS method. ### Application Scenarios This method is applicable not only to the manufacturing industry, but also to other fields such as retail, finance, document classification, and sensor networks, where features may change or be added over time. For example, in the retail industry, sales strategies can be adjusted by identifying the time point at which brand reputation changes; in the financial field, it can be used...