Abstract:Studies have shown that toxic behavior can cause contributors to leave, and hinder newcomers' (especially from underrepresented communities) participation in Open Source Software (OSS) projects. Thus, detection of toxic language plays a crucial role in OSS collaboration and inclusivity. Off-the-shelf toxicity detectors are ineffective when applied to OSS communications, due to the distinct nature of toxicity observed in these channels (e.g., entitlement and arrogance are more frequently observed on GitHub than on Reddit or Twitter). In this paper, we investigate a machine learning-based approach for the automatic detection of toxic communications in OSS. We leverage psycholinguistic lexicons, and Moral Foundations Theory to analyze toxicity in two types of OSS communication channels; issue comments and code reviews. Our evaluation indicates that our approach can achieve a significant performance improvement (up to 7% increase in F1 score) over the existing domain-specific toxicity detector. We found that using moral values as features is more effective than linguistic cues, resulting in 67.50% F1-measure in identifying toxic instances in code review data and 64.83% in issue comments. While the detection accuracy is far from accurate, this improvement demonstrates the potential of integrating moral and psycholinguistic features in toxicity detection models. These findings highlight the importance of context-specific models that consider the unique communication styles within OSS, where interpersonal and value-driven language dynamics differ markedly from general social media platforms. Future work could focus on refining these models to further enhance detection accuracy, possibly by incorporating community-specific norms and conversational context to better capture the nuanced expressions of toxicity in OSS environments.
What problem does this paper attempt to address?
This paper attempts to solve the problem of automatically detecting toxic communication in open - source software (OSS) projects. Specifically, the author focuses on the problem that existing general - purpose toxicity detection tools perform poorly when applied to OSS communication, because toxicity expressions in OSS are unique. For example, behaviors such as insults, arrogance, and imperiousness caused by technical differences are more common. These characteristics make existing general - purpose toxicity detection tools unable to effectively identify "covert toxicity" in OSS.
### Core Problems of the Paper
1. **Limitations of Existing Tools**: Existing general - purpose toxicity detection tools (such as Google Perspective API) perform poorly when applied to OSS communication because they fail to capture the language styles and norms specific to OSS.
2. **Toxicity Characteristics in OSS**: Toxicity in OSS is not just simple offensive language, but also includes subtle emotional expressions in technical discussions, such as sarcasm, imperiousness, etc., which are less common on other platforms (such as Reddit or Twitter).
3. **Improving Detection Accuracy**: The author hopes to improve the toxicity detection model in OSS communication by combining psychology and moral theories to improve its accuracy and applicability.
### Solutions
To address these problems, the author proposes a machine - learning - based method that uses a psycholinguistic dictionary and the Moral Foundations Theory (MFT) to analyze toxicity in OSS. Specific methods include:
- **Psycholinguistic Features**: Use the Linguistic Inquiry and Word Count (LIWC) dictionary to extract the psycholinguistic features of the text, such as "Clout", "Authentic", "Tone", etc.
- **Moral Features**: According to MFT, analyze the moral values (such as care/harm, fairness/cheating, authority/subversion, loyalty/betrayal, purity/decadence) in the text as features.
In this way, the author hopes to develop a toxicity detection model more suitable for the OSS environment, thereby better supporting the inclusiveness and sustainable development of OSS projects.
### Experimental Results
The experimental results show that after combining psycholinguistic and moral features, the performance of the model has been significantly improved, especially in the F1 score, with a maximum improvement of 7%. This indicates that considering the language and moral background specific to OSS is crucial for improving the accuracy of toxicity detection.
### Future Work
Future research can further optimize these models, for example, by introducing community - specific norms and dialogue contexts to more accurately capture the subtle manifestations of toxicity in the OSS environment.