Transferable and Forecastable User Targeting Foundation Model

Bin Dou,Baokun Wang,Yun Zhu,Xiaotong Lin,Yike Xu,Xiaorui Huang,Yang Chen,Yun Liu,Shaoshuai Han,Yongchao Liu,Tianyi Zhang,Yu Cheng,Weiqiang Wang,Chuntao Hong
2024-12-17
Abstract:User targeting, the process of selecting targeted users from a pool of candidates for non-expert marketers, has garnered substantial attention with the advancements in digital marketing. However, existing user targeting methods encounter two significant challenges: (i) Poor cross-domain and cross-scenario transferability and generalization, and (ii) Insufficient forecastability in real-world applications. These limitations hinder their applicability across diverse industrial scenarios. In this work, we propose FIND, an industrial-grade, transferable, and forecastable user targeting foundation model. To enhance cross-domain transferability, our framework integrates heterogeneous multi-scenario user data, aligning them with one-sentence targeting demand inputs through contrastive pre-training. For improved forecastability, the text description of each user is derived based on anticipated future behaviors, while user representations are constructed from historical information. Experimental results demonstrate that our approach significantly outperforms existing baselines in cross-domain, real-world user targeting scenarios, showcasing the superior capabilities of FIND. Moreover, our method has been successfully deployed on the Alipay platform and is widely utilized across various scenarios.
Machine Learning,Artificial Intelligence
What problem does this paper attempt to address?
This paper attempts to solve two main problems in user targeting: 1. **Lack of transferability and generalization ability across domains and scenarios**: The existing user - targeting methods have poor transferability and generalization ability between different domains and scenarios, which limits their application in diverse industrial scenarios. 2. **Lack of predictive ability in practical applications**: The existing methods show the problem of insufficient predictive ability in real - world application scenarios, especially in complex and changeable environments. To solve these problems, the author proposes an industrial - level, transferable and predictable user - targeting basic model named FIND. Specifically, FIND enhances transferability and predictive ability in the following ways: - **Enhancing cross - domain transferability**: The FIND framework integrates heterogeneous multi - scenario user data and aligns these data with one - sentence positioning requirement input through contrastive pre - training. - **Improving predictive ability**: The text description of each user is generated based on the expected future behavior, and the user representation is constructed from historical information. The experimental results show that FIND significantly outperforms the existing baseline methods in cross - domain real - world user - targeting scenarios, demonstrating its excellent ability. In addition, FIND has been successfully deployed on the Alipay platform and is widely used in various scenarios, further proving its effectiveness and practicality. ### Formula Summary To understand the working principle of FIND more clearly, the following are some key formulas involved in the paper: 1. **User behavior sequence encoding**: \[ e(V_i) = P(g_\theta(B_i, M_i, S_i)) \] where \(P\) represents the average pooling function, \(g_\theta\) is a composite function \(g_\theta = g_\mu \circ g_\nu\), where \(g_\mu\) is a time - aware function (such as GRU), and \(g_\nu\) is a text - encoding function (such as ALBERT). \(z = g_\nu(B_i, M_i, S_i)\), \(c = g_\mu(z)\). 2. **Contrastive learning loss**: \[ L_{CL} = -\frac{1}{K}\sum_{t = 1}^{K}\sum_{i = 1}^{k}\log\frac{\exp(s(c_t, z_{t + i}))}{\sum_{j = 1}^{K}\exp(s(c_t, z_{t + j}))} \] where \(z_t\) is the user behavior embedding at time stamp \(t\) from three data modalities, \(c_t\) is the embedding with time information aggregated by \(\mu\), \(s\) is the cosine similarity function, \(K\) is the total time length, and \(k\ll K\) controls the window size of positive samples. 3. **Self - supervised pre - training total loss**: \[ L_{UB} = L_{CL}+\lambda\cdot KL(c_t\|c_{t + T}) \] where \(\lambda\) is a coefficient that controls the strength of the regularization term, \(KL\) is the KL divergence between two input vectors, and \(T\) is the regularization time period. 4. **Table - format encoding**: \[ e(tab)_i = h_\phi(T_i) \] 5. **Text encoding**: \[ e(r)_i = f_\omega(R_i) \] 6. **Multi - modal user feature fusion**: \[ e_f_i = CA(e(V_i), e(tab)_i, e(r)_i) \] 7. **Contrastive text - user pre - training loss**: \[ L_{CP}=-\frac{1}{B}\sum_{i = 1}^{B}\log\frac{\exp(s(e_f_{i,t_1}, e_q_{i,t_2}))}{\sum_{j = 1}^{B}\exp(s(e_