Biasly: a machine learning based platform for automatic racial discrimination detection in online texts
David Bamman,Chris Dyer,Noah A. Smith. 2014,Steven Bird,Ewan Klein,E. Loper,Nat-527,J. Devlin,Ming-Wei Chang,Kenton Lee,Kristina Toutanova. 2019,Bert,Samuel Gehman,Suchin Gururangan,Maarten Sap,Dan Hendrycks,Kevin Gimpel. 2020,Gaussian,Alex Lamb,Di He,Anirudh Goyal,Guolin Ke,Feng Liao,M. Ravanelli,Y. Bengio,Zhenzhong Lan,Mingda Chen,Sebastian Goodman,Yann LeCun,B. Boser,J. Denker,Don-608 nie Henderson,R. Howard,Wayne Hubbard,Yinhan Liu,Myle Ott,Naman Goyal,Jingfei Du,Mandar Joshi,Danqi Chen,Omer Levy,Mike Lewis
2022-01-01
Abstract:Warning : this paper contains content that may 001 be offensive or upsetting. 002 Detecting hateful, toxic, and otherwise racist 003 or sexist language in user-generated online con-004 tents has become an increasingly important task 005 in recent years. Indeed, the anonymity, the 006 transience, the size of messages, and the dif-007 ficulty of management, facilitate the diffusion 008 of racist or hateful messages across the Inter-009 net. The critical influence of this cyber-racism 010 is no longer limited to social media, but also 011 has a significant effect on our society : corpo-012 rate business operation, users’ health, crimes, 013 etc. Traditional racist speech reporting chan-014 nels have proven inadequate due to the enor-015 mous explosion of information, so there is an 016 urgent need for a method to automatically and 017 promptly detect texts with racial discrimination. 018 We propose in this work, a machine learning-019 based approach to enable automatic detection 020 of racist text content over the internet. State-of-021 the-art machine learning models that are able 022 to grasp language structures are adapted in this 023 study. Our main contribution include 1) a large 024 scale racial discrimination data set collected 025 from three distinct sources and annotated ac-026 cording to a guideline developed by specialists, 027 2) a set of machine learning models with vari-028 ous architectures for racial discrimination de-029 tection, and 3) a web-browser-based software 030 that assist users to debias their texts when us-031 ing the internet. All these resources are made 032 publicly available.