Invoke-Deobfuscation: AST-Based and Semantics-Preserving Deobfuscation for PowerShell Scripts

Huajun Chai,Lingyun Ying,Haixin Duan,Daren Zha
DOI: https://doi.org/10.1109/dsn53405.2022.00039
2022-01-01
Abstract:In recent years, PowerShell has been widely used in cyber attacks and malicious PowerShell scripts can easily evade the detection of anti-virus software through obfuscation. Existing deobfuscation tools often fail to recover obfuscated scripts correctly due to imprecise obfuscation identification, improper recovery and wrong replacement. In this paper, we propose an AST-based and semantics-preserving deobfuscation approach, Invoke-Deobfuscation. It utilizes recoverable nodes of Abstract Syntax Tree to identify obfuscated pieces precisely, simulates the recovery process through Invoke function and variable tracing, and replaces obfuscated pieces in place to keep the original semantics. We build a large evaluation dataset containing 39,713 wild PowerShell scripts. Compared with the state-of-the-art tools, the experimental results show Invoke-Deobfuscation performs most efficiently. It recovers much more key information than others and significantly reduces samples’ obfuscation score, on average, by 46%. Moreover, 100% of Invoke-Deobfuscation’s results have the same network behavior as the original scripts.
What problem does this paper attempt to address?