Who will Win the Data Science Competition? Insights from KDD Cup 2019 and Beyond
Hao Liu,Qingyu Guo,Hengshu Zhu,Fuzhen Zhuang,Shenwen Yang,Dejing Dou,Hui Xiong
DOI: https://doi.org/10.1145/3511896
IF: 4.157
2022-10-31
ACM Transactions on Knowledge Discovery from Data
Abstract:Data science competitions are becoming increasingly popular for enterprises collecting advanced innovative solutions and allowing contestants to sharpen their data science skills. Most existing studies about data science competitions have a focus on improving task-specific data science techniques, such as algorithm design and parameter tuning. However, little effort has been made to understand the data science competition itself. To this end, in this article, we shed light on the team’s competition performance, and investigate the team’s evolving performance in the crowd-sourcing competitive innovation context. Specifically, we first acquire and construct multi-sourced datasets of various data science competitions, including the KDD Cup 2019 machine learning competition and beyond. Then, we conduct an empirical analysis to identify and quantify a rich set of features that are significantly correlated with teams’ future performances. By leveraging team’s rank as a proxy, we observe “the stronger, the stronger” rule; that is, top-ranked teams tend to keep their advantages and dominate weaker teams for the rest of the competition. Our results also confirm that teams with diversified backgrounds tend to achieve better performances. After that, we formulate the team’s future rank prediction problem and propose the Multi-Task Representation Learning (MTRL) framework to model both static features and dynamic features. Extensive experimental results on four real-world data science competitions demonstrate the team’s future performance can be well predicted by using MTRL. Finally, we envision our study will not only help competition organizers to understand the competition in a better way, but also provide strategic implications to contestants, such as guiding the team formation and designing the submission strategy.
computer science, information systems, software engineering