Are We Ready to Embrace Generative AI for Software Q&A?

Bowen Xu,Thanh-Dat Nguyen,Thanh Le-Cong,Thong Hoang,Jiakun Liu,Kisub Kim,Chen Gong,Changan Niu,Chenyu Wang,Bach Le,David Lo
DOI: https://doi.org/10.1109/ase56229.2023.00023
2023-01-01
Abstract:Stack Overflow, the world's largest software Q&A (SQA) website, is facing a significant traffic drop due to the emergence of generative AI techniques. ChatGPT is banned by Stack Overflow after only 6 days from its release. The main reason provided by the official Stack Overflow is that the answers generated by ChatGPT are of low quality. To verify this, we conduct a comparative evaluation of human-written and ChatGPT-generated answers. Our methodology employs both automatic comparison and a manual study. Our results suggest that human-written and ChatGPT-generated answers are semantically similar, however, human-written answers outperform ChatGPT-generated ones consistently across multiple aspects, specifically by 10% on the overall score. We release the data, analysis scripts, and detailed results at https://github.com/maxxbw54/GAI4SQA.
What problem does this paper attempt to address?