A Study of C/C++ Code Weaknesses on Stack Overflow

Haoxiang Zhang,Shaowei Wang,Heng Li,Tse-Hsun Chen,Ahmed E. Hassan
DOI: https://doi.org/10.1109/tse.2021.3058985
IF: 7.4
2022-01-01
IEEE Transactions on Software Engineering
Abstract:Stack Overflow hosts millions of solutions that aim to solve developers’ programming issues. In this crowdsourced question answering process, Stack Overflow becomes a code hosting website where developers actively share its code. However, code snippets on Stack Overflow may contain security vulnerabilities, and if shared carelessly, such snippets can introduce security problems in software systems. In this paper, we empirically study the prevalence of the Common Weakness Enumeration – CWE, in code snippets of C/C++ related answers. We explore the characteristics of $Code_w$ , i.e., code snippets that have CWE instances, in terms of the types of weaknesses, the evolution of $Code_w$ , and who contributed such code snippets. We find that: 1) 36 percent (i.e., 32 out of 89) CWE types are detected in $Code_w$ on Stack Overflow. Particularly, CWE-119, i.e., improper restriction of operations within the bounds of a memory buffer , is common in both answer code snippets and real-world software systems. Furthermore, the proportion of $Code_w$ doubled from 2008 to 2018 after normalizing by the total number of C/C++ snippets in each year. 2) In general, code revisions are associated with a reduction in the number of code weaknesses. However, the majority of $Code_w$ had weaknesses introduced in the first version of the code, and these $Code_w$ were never revised since then. Only 7.5 percent of users who contributed C/C++ code snippets posted or edited code with weaknesses. Users contributed less code with CWE weakness when they were more active (i.e., they either revised more code snippets or had a higher reputation). We also find that some users tended to have the same CWE type repeatedly in their various code snippets. Our empirical study provides insights to users who share code snippets on Stack Overflow so that they are aware of the potential security issues. To understand the community feedback about improving code weaknesses by answer revisions, we also conduct a qualitative study and find that 62.5 percent of our suggested revisions are adopted by the community. Stack Overflow can perform CWE scanning for all the code that is hosted on its platform. Further research is needed to improve the quality of the crowdsourced knowledge on Stack Overflow.
What problem does this paper attempt to address?