CASPR: Customer Activity Sequence-based Prediction and Representation

Pin-Jung Chen,Sahil Bhatnagar,Sagar Goyal,Damian Konrad Kowalczyk,Mayank Shrivastava
DOI: https://doi.org/10.48550/arXiv.2211.09174
2022-11-29
Abstract:Tasks critical to enterprise profitability, such as customer churn prediction, fraudulent account detection or customer lifetime value estimation, are often tackled by models trained on features engineered from customer data in tabular format. Application-specific feature engineering adds development, operationalization and maintenance costs over time. Recent advances in representation learning present an opportunity to simplify and generalize feature engineering across applications. When applying these advancements to tabular data researchers deal with data heterogeneity, variations in customer engagement history or the sheer volume of enterprise datasets. In this paper, we propose a novel approach to encode tabular data containing customer transactions, purchase history and other interactions into a generic representation of a customer's association with the business. We then evaluate these embeddings as features to train multiple models spanning a variety of applications. CASPR, Customer Activity Sequence-based Prediction and Representation, applies Transformer architecture to encode activity sequences to improve model performance and avoid bespoke feature engineering across applications. Our experiments at scale validate CASPR for both small and large enterprise applications.
Machine Learning,Artificial Intelligence
What problem does this paper attempt to address?
The paper aims to address the challenges faced by enterprises when using customer data for critical business predictions, such as customer churn prediction, fraud account detection, or customer lifetime value estimation. Traditional methods rely on engineering features from tabular data, which not only increases the cost of development, operation, and maintenance but also makes it difficult to transfer between different applications. To simplify and generalize these feature engineering processes, researchers have proposed the CASPR (Customer Activity Sequence-based Prediction and Representation) framework. This framework encodes customer transaction records, purchase history, and other interactions into a universal representation, then evaluates the effectiveness of these embeddings as features for training various models. Specifically, CASPR leverages the Transformer architecture to encode activity sequences, thereby improving model performance and avoiding the need for custom feature engineering for each application scenario. Experiments have validated the effectiveness of CASPR in both small and large enterprise applications.