Abstract:Relevance module plays a fundamental role in e-commerce search as they are responsible for selecting relevant products from thousands of items based on user queries, thereby enhancing users experience and efficiency. The traditional approach models the relevance based product titles and queries, but the information in titles alone maybe insufficient to describe the products completely. A more general optimization approach is to further leverage product image information. In recent years, vision-language pre-training models have achieved impressive results in many scenarios, which leverage contrastive learning to map both textual and visual features into a joint embedding space. In e-commerce, a common practice is to fine-tune on the pre-trained model based on e-commerce data. However, the performance is sub-optimal because the vision-language pre-training models lack of alignment specifically designed for queries. In this paper, we propose a method called Query-LIFE (Query-aware Language Image Fusion Embedding) to address these challenges. Query-LIFE utilizes a query-based multimodal fusion to effectively incorporate the image and title based on the product types. Additionally, it employs query-aware modal alignment to enhance the accuracy of the comprehensive representation of products. Furthermore, we design GenFilt, which utilizes the generation capability of large models to filter out false negative samples and further improve the overall performance of the contrastive learning task in the model. Experiments have demonstrated that Query-LIFE outperforms existing baselines. We have conducted ablation studies and human evaluations to validate the effectiveness of each module within Query-LIFE. Moreover, Query-LIFE has been deployed on Miravia Search, resulting in improved both relevance and conversion efficiency.

Enhancing Dynamic Image Advertising with Vision-Language Pre-training

Boost CTR Prediction for New Advertisements via Modeling Visual Content

A1 O1 LI1 E ADVERTISEME1 T PLATFORM BASED O1 IMAGE CO1 TE1 T BIDDI1 G

Tree-based Text-Vision BERT for Video Search in Baidu Video Advertising

Videoader: a video advertising system based on intelligent analysis of visual content

An Online Advertisement Platform Based On Image Content Bidding 0.2

COURIER: Contrastive User Intention Reconstruction for Large-Scale Visual Recommendation

The Contemporary Art of Image Search: Iterative User Intent Expansion via Vision-Language Model

On Detection of Advertising Images

An optimization framework of video advertising: using deep learning algorithm based on global image information

Look, Read and Feel: Benchmarking Ads Understanding with Multimodal Multitask Learning

MAKE: Vision-Language Pre-training based Product Retrieval in Taobao Search

Combo-Attention Network for Baidu Video Advertising

VISIADS: a vision-based advertising platform for camera phones

Query-LIFE: Query-aware Language Image Fusion Embedding for E-Commerce Relevance

Delivering online advertisements inside images.

AdsCVLR: Commercial Visual-Linguistic Representation Modeling in Sponsored Search

Advertisement design in dynamic interactive scenarios using DeepFM and long short-term memory (LSTM)

Visual Contextual Advertising: Bringing Textual Advertisements to Images

Delving into E-Commerce Product Retrieval with Vision-Language Pre-training

Enhancing Interactive Image Retrieval With Query Rewriting Using Large Language Models and Vision Language Models