HITSZ-HLT at SemEval-2021 Task 5: Ensemble Sequence Labeling and Span Boundary Detection for Toxic Span Detection.

Qinglin Zhu,Zijie Lin,Yice Zhang,Jingyi Sun,Xiang Li,Qihui Lin,Yixue Dang,Ruifeng Xu
DOI: https://doi.org/10.18653/v1/2021.semeval-1.63
2021-01-01
Abstract:This paper presents the winning system that participated in SemEval-2021 Task 5: Toxic Spans Detection. This task aims to locate those spans that attribute to the text’s toxicity within a text, which is crucial for semi-automated moderation in online discussions. We formalize this task as the Sequence Labeling (SL) problem and the Span Boundary Detection (SBD) problem separately and employ three state-of-the-art models. Next, we integrate predictions of these models to produce a more credible and complement result. Our system achieves a char-level score of 70.83%, ranking 1/91. In addition, we also explore the lexicon-based method, which is strongly interpretable and flexible in practice.
What problem does this paper attempt to address?