Code2Image: Intelligent Code Analysis by Computer Vision Techniques and Application to Vulnerability Prediction

Zeki Bilgin
DOI: https://doi.org/10.48550/arXiv.2105.03131
2021-05-07
Abstract:Intelligent code analysis has received increasing attention in parallel with the remarkable advances in the field of machine learning (ML) in recent years. A major challenge in leveraging ML for this purpose is to represent source code in a useful form that ML algorithms can accept as input. In this study, we present a novel method to represent source code as image while preserving semantic and syntactic properties, which paves the way for leveraging computer vision techniques to use for code analysis. Indeed the method makes it possible to directly enter the resulting image representation of source codes into deep learning (DL) algorithms as input without requiring any further data pre-processing or feature extraction step. We demonstrate feasibility and effectiveness of our method by realizing a vulnerability prediction use case over a public dataset containing a large number of real-world source code samples with performance evaluation in comparison to the state-of-art solutions. Our implementation is publicly available.
Software Engineering,Cryptography and Security,Machine Learning,Programming Languages
What problem does this paper attempt to address?