One for Dozens: Adaptive REcommendation for All Domains with Counterfactual Augmentation

Huishi Luo,Yiwen Chen,Yiqing Wu,Fuzhen Zhuang,Deqing Wang
2024-12-16
Abstract:Multi-domain recommendation (MDR) aims to enhance recommendation performance across various domains. However, real-world recommender systems in online platforms often need to handle dozens or even hundreds of domains, far exceeding the capabilities of traditional MDR algorithms, which typically focus on fewer than five domains. Key challenges include a substantial increase in parameter count, high maintenance costs, and intricate knowledge transfer patterns across domains. Furthermore, minor domains often suffer from data sparsity, leading to inadequate training in classical methods. To address these issues, we propose Adaptive REcommendation for All Domains with counterfactual augmentation (AREAD). AREAD employs a hierarchical structure with a limited number of expert networks at several layers, to effectively capture domain knowledge at different granularities. To adaptively capture the knowledge transfer pattern across domains, we generate and iteratively prune a hierarchical expert network selection mask for each domain during training. Additionally, counterfactual assumptions are used to augment data in minor domains, supporting their iterative mask pruning. Our experiments on two public datasets, each encompassing over twenty domains, demonstrate AREAD's effectiveness, especially in data-sparse domains. Source code is available at <a class="link-external link-https" href="https://github.com/Chrissie-Law/AREAD-Multi-Domain-Recommendation" rel="external noopener nofollow">this https URL</a>.
Information Retrieval
What problem does this paper attempt to address?
This paper attempts to solve several key problems faced by multi - domain recommendation systems (Multi - Domain Recommendation, MDR) when dealing with a large number of domains. Specifically, these problems include: 1. **System scalability issues**: Traditional MDR algorithms can usually handle only a few domains (usually less than 5), but in the real world, recommendation systems need to handle dozens or even hundreds of domains. This leads to a significant increase in the number of model parameters, high maintenance costs, and complex cross - domain knowledge transfer patterns. In addition, the data sparsity problem in small domains also results in insufficient training. 2. **Complex cross - domain knowledge transfer patterns**: As the number of domains increases, the traditional binary classification framework that divides knowledge into domain - shared or domain - specific becomes no longer applicable. Knowledge transfer between different domains is directional, and the knowledge of some domains may not be applicable to other domains, so determining which domains should be learned together becomes a challenge. 3. **Large differences in sample sizes across domains**: As shown in Figure 1, nearly half of the domain sample sizes are less than 2% of the total sample size. This huge difference in sample sizes causes existing MDR models to be dominated by data - rich domains during the optimization process, thus affecting the optimization effect on data - sparse domains (especially small domains). To solve the above problems, the authors propose a new framework named Adaptive REcommendation for All Domains with counterfactual augmentation (AREAD). AREAD addresses these challenges in the following ways: - **Hierarchical Expert Integration (HEI)**: Use a small number of expert networks in the hierarchical structure to capture domain knowledge at different granularities, reduce the number of parameters, and improve the scalability of the model. - **Hierarchical Expert Mask Pruning (HEMP)**: Select specific experts for each domain by iteratively generating and pruning masks to adapt to complex cross - domain knowledge transfer patterns. - **Popularity - based Counterfactual Augmentation**: Perform counterfactual augmentation on the data of small domains to compensate for the data sparsity problem. The experimental results show that AREAD performs excellently on two public datasets (each dataset contains more than 20 domains), especially with a significant improvement in data - sparse domains.