Reducing Workload of Manual Annotation for User Requests Via a Novel Active Learning Framework

Yuan Zhang,Chuanyi Li,Wentao Zou,Bin Luo
DOI: https://doi.org/10.1109/iiotbdsc57192.2022.00047
2022-01-01
Abstract:Classifying user requests is the first step of transforming them into structured requirements specifications. Manual classification is time-consuming, labor-intensive and error prone. Many supervised learning methods therefore have been proposed for automatic classification. These methods need to manually annotate large amount of data instances to train a model to automatically classify the others. To reduce workload of manual annotation, Active Learning (AL) have been proposed. Few works have been done to employ AL in user requests classification. However, these works could not avoid missed class effect and sampling bias, which are two common problems in AL and would be severer in user requests classification. In this paper, we propose a novel active learning framework, including seed selection strategy and query strategy, to solve above two problems respectively. Seed selection strategy extends Gaussian Mixture Model through text similarity and query strategy combines uncertainty sampling with diversity measured by domain knowledge of user requests. An empirical evaluation on there datasets containing user requests from different open source projects demonstrates that our method could reduce more workload of manual annotation than baseline approaches while performance of the model does not decrease.
What problem does this paper attempt to address?