Towards Automatic Discovering of Deep Hybrid Network Architecture for Sequential Recommendation

Mingyue Cheng,Zhiding Liu,Qi Liu,Shenyang Ge,Enhong Chen
DOI: https://doi.org/10.1145/3485447.3512066
2022-01-01
Abstract:ABSTRACT Recent years have witnessed great success in deep learning-based sequential recommendation (SR), which can provide more timely and accurate recommendations. One of the most effective deep SR architectures is to stack high-performance residual blocks, e.g., prevalent self-attentive and convolutional operations, for capturing long- and short-range dependence of sequential behaviors. By carefully revisiting previous models, we observe: 1) simple architecture modification of gating each residual connection can help us train deeper SR models and yield significant improvements; 2) compared with self-attention mechanism, stacking of convolution layers also can cover each item of the whole sequential behaviors and achieve competitive or even superior performance. Guided by these findings, it is meaningful to design a deeper hybrid SR model to ensemble the capacity of both self-attentive and convolutional architectures for SR tasks. In this work, we aim to achieve this goal in the automatic algorithm sense, and propose NASR, an efficient neural architecture search (NAS) method that can automatically select the architecture operation on each layer. Specifically, we firstly design a Table-like search space, involving both self-attentive and convolutional-based SR architectures in a flexible manner. In the search phase, we leverage weight-sharing supernets to encode the entire search space, and further propose to factorize the whole supernet into blocks to ensure the potential candidate SR architectures can be fully trained. Owning to lacking supervisions, we train each block-wise supernet with a self-supervised contrastive optimization scheme, in which the training signals are constructed by conducting data augmentation on original sequential behaviors. The empirical studies show that the discovered deep hybrid network architectures can exhibit substantial improvements over compared baselines, indicating the practicality of searching deep hybrid network architectures on SR tasks. Notably, we show the discovered architecture also enjoys good generalizability and transferability among different datasets.
What problem does this paper attempt to address?