Multi-modal Learning for WebAssembly Reverse Engineering

Hanxian Huang,Jishen Zhao
DOI: https://doi.org/10.1145/3650212.3652141
2024-04-04
Abstract:The increasing adoption of WebAssembly (Wasm) for performance-critical and security-sensitive tasks drives the demand for WebAssembly program comprehension and reverse engineering. Recent studies have introduced machine learning (ML)-based WebAssembly reverse engineering tools. Yet, the generalization of task-specific ML solutions remains challenging, because their effectiveness hinges on the availability of an ample supply of high-quality task-specific labeled data. Moreover, previous works overlook the high-level semantics present in source code and its documentation. Acknowledging the abundance of available source code with documentation, which can be compiled into WebAssembly, we propose to learn representations of them concurrently and harness their mutual relationships for effective WebAssembly reverse engineering.
Software Engineering,Machine Learning,Programming Languages
What problem does this paper attempt to address?