Virtual Control Group: Measuring Hidden Performance Metrics

Moshe Tocker
DOI: https://doi.org/10.48550/arXiv.2208.12941
2022-08-27
Abstract:Performance metrics measuring in Financial Integrity systems are crucial for maintaining an efficient and cost effective operation. An important performance metric is False Positive Rate. This metric cannot be directly monitored since we don't know for sure if a user is bad once blocked. We present a statistical method based on survey theory and causal inference methods to estimate the false positive rate of the system or a single blocking policy. We also suggest a new approach of outcome matching that in some cases including empirical data outperformed other commonly used methods. The approaches described in this paper can be applied in other Integrity domains such as Cyber Security.
Machine Learning
What problem does this paper attempt to address?
This paper attempts to solve the problem of performance indicator measurement in Financial Integrity Systems, especially the estimation problem of the False Positive Rate (FPR). Specifically: 1. **Problem Background**: - In the financial integrity system, accurate performance indicators are crucial for maintaining efficient and cost - effective operations. - The false positive rate is an important performance indicator, but it cannot be directly monitored because we cannot determine whether the blocked users are really malicious users. 2. **Limitations of Existing Methods**: - **Control Group**: The cost of introducing a control group is high, and in many cases, it cannot be implemented due to product or regulatory limitations. - **Manual Review**: It requires a large amount of manpower and the results are very noisy, especially in high - dimensional problems such as fraud labels. - **Time Delay**: Fraud signals (such as chargebacks) in the financial field usually have a long confirmation time, resulting in a delay in performance indicator measurement. 3. **The Method Proposed in the Paper**: - The paper proposes a statistical method based on survey theory and causal inference methods to estimate the false positive rate of the system or a single blocking strategy. - Specific methods include: - **Propensity Score Matching**: \[ \pi(x) = P(W = 1|X = x) \] - **Outcome Score Matching**: \[ \hat{\mu}_{OSM}=\frac{1}{|I_y|}\sum_{i\in I_y}y_i \] - **Inverse Probability Weighting (IPW)**: \[ \hat{\mu}_{IPW - NR}=\frac{\sum_i w_i\pi^{-1}(1 - \pi)y_i}{\sum_i w_i\pi^{-1}(1 - \pi)} \] - **Regression Estimators**: \[ \hat{\mu}_{MPO}=\frac{1}{|U_t|}\sum_{i\in U_t}\hat{m}(x_i) \] - **Doubly Robust Methods**: \[ \hat{\mu}_{WMPO}=\frac{1}{|U_t|}\sum_{i\in U_t}\hat{m}_w(x_i) \] 4. **Innovation Points**: - A new outcome matching method is proposed, which is superior to other commonly used methods in some cases (including empirical data). - The method can be applied to other integrity fields, such as network security, content integrity, etc. 5. **Experimental Verification**: - The effectiveness of the proposed method is verified through simulation studies and actual data sets (such as the Kaggle credit card fraud detection data set). - The results show that the outcome score matching method performs best in terms of Root Mean Square Error (RMSE) and Mean Absolute Error (MAE). In summary, this paper aims to solve the problem that the false positive rate is difficult to directly measure in the financial integrity system through statistical methods, and provides an efficient and low - cost alternative.