SkillMatch: Evaluating Self-supervised Learning of Skill Relatedness

Jens-Joris Decorte,Jeroen Van Hautte,Thomas Demeester,Chris Develder
2024-10-07
Abstract:Accurately modeling the relationships between skills is a crucial part of human resources processes such as recruitment and employee development. Yet, no benchmarks exist to evaluate such methods directly. We construct and release SkillMatch, a benchmark for the task of skill relatedness, based on expert knowledge mining from millions of job ads. Additionally, we propose a scalable self-supervised learning technique to adapt a Sentence-BERT model based on skill co-occurrence in job ads. This new method greatly surpasses traditional models for skill relatedness as measured on SkillMatch. By releasing SkillMatch publicly, we aim to contribute a foundation for research towards increased accuracy and transparency of skill-based recommendation systems.
Computation and Language
What problem does this paper attempt to address?
The main goal of this paper is to address the issue of skill relevance modeling in Human Resources (HR) processes. Specifically, the authors propose a new benchmark dataset called SkillMatch for evaluating methods of skill relevance. Additionally, they introduce a self-supervised learning technique based on skill co-occurrence in job advertisements to improve the performance of the Sentence-BERT model on skill relevance tasks. The key contributions of the paper include: 1. **Construction of the SkillMatch dataset**: By mining expert knowledge from millions of job advertisements, a benchmark dataset of skill relevance containing positive and negative sample pairs was created. This dataset aims to provide a standard for directly evaluating the performance of skill relevance methods. 2. **Proposing a new self-supervised learning method**: A self-supervised learning technique based on skill co-occurrence was developed to adapt the Sentence-BERT model. This method leverages the co-occurrence patterns of skills in job advertisements to better capture the relationships between skills. 3. **Experimental validation**: By comparing the performance of various models (such as Word2Vec, fastText, and Sentence-BERT) on the SkillMatch dataset, the newly proposed self-supervised learning method was shown to have significant advantages in skill relevance tasks. By releasing the SkillMatch dataset, the authors hope to provide a foundation for future research to improve the accuracy and transparency of skill-based recommendation systems.