TAROT: A Hierarchical Framework with Multitask Co-Pretraining on Semi-Structured Data towards Effective Person-Job Fit

Yihan Cao,Xu Chen,Lun Du,Hao Chen,Qiang Fu,Shi Han,Yushu Du,Yanbin Kang,Guangming Lu,Zi Li
2024-01-18
Abstract:Person-job fit is an essential part of online recruitment platforms in serving various downstream applications like Job Search and Candidate Recommendation. Recently, pretrained large language models have further enhanced the effectiveness by leveraging richer textual information in user profiles and job descriptions apart from user behavior features and job metadata. However, the general domain-oriented design struggles to capture the unique structural information within user profiles and job descriptions, leading to a loss of latent semantic correlations. We propose TAROT, a hierarchical multitask co-pretraining framework, to better utilize structural and semantic information for informative text embeddings. TAROT targets semi-structured text in profiles and jobs, and it is co-pretained with multi-grained pretraining tasks to constrain the acquired semantic information at each level. Experiments on a real-world LinkedIn dataset show significant performance improvements, proving its effectiveness in person-job fit tasks.
Computation and Language,Artificial Intelligence
What problem does this paper attempt to address?
The paper aims to address the issue of Person-Job fit in online recruitment platforms, particularly in leveraging textual information from user resumes and job descriptions. Traditional approaches mainly rely on user behavior features or job metadata, but with the development of online recruitment platforms like LinkedIn, large-scale textual information from user resumes and job descriptions has become increasingly important. However, existing large language models (such as BERT, GPT-3), although performing well in natural language processing, have limitations when dealing with structured texts in specific domains. They struggle to capture the unique structural information in resumes and job descriptions, leading to the loss of potential semantic associations. To address the above issues, the paper proposes the TAROT framework, a multi-level, multi-task collaborative pre-training framework designed to better utilize the structural and semantic information in resumes and job descriptions, generating more informative text embeddings. TAROT conducts multi-granularity pre-training tasks on semi-structured text data to constrain the semantic information obtained at each level. Experimental results show that TAROT significantly improves the performance of the Person-Job fit task on real-world LinkedIn datasets.