MIPAD: Mini Program Analysis for Clone Detection Using Static Analysis Techniques
Zhaohui Zhou,Ziqiang Yan,Yin Wang,Junfeng Liu,Jifei Shi,Ming Fan
DOI: https://doi.org/10.1109/frse58934.2023.00052
2023-01-01
Abstract:In recent years, third-party platform-mounted applications, referred to as mini programs, such as health QR codes, transport codes, and utilities, have been gradually replacing traditional mobile applications due to their no-installation-uninstallation and use-it-and-go feature. However, the massive growth of mini programs has led to concerns about protecting the copyright of their code. Currently, there is not enough research on clone detection for mini programs, and the language features of mini programs make it difficult to detect plagiarism due to incomplete behaviour observation and challenges in calculating similarity. To address this gap, we propose MIPAD, a detection method based on static feature analysis, including statistical features (SF) for clustering analysis, layout features (LF), and code features (CFF, FDF, TLDF) for similarity detection. To enhance the robustness of the LF and CFF, FDF, TLDF features during the feature extraction phase, we used a fuzzy hash algorithm. To speed up the dependency graph similarity computation, we propose a fast anchor-based similarity computation algorithm. To address the lack of publicly available large sample datasets in this domain, we designed a mini program crawler method that can fuzzy crawl samples based on a seed list and expand the list in real-time, and we used this method to crawl 100,000-level mini program samples. Using these samples, we evaluated MIPAD using a Random Forest as a classifier and X-means as a clusterizer, which showed an accuracy of 90.5% and an average sample time overhead of 15. 83s, demonstrating that MIPAD can detect cloned mini programs quickly and effectively.