Hardware Trojan Detection for Gate-level Netlists Based on Graph Neural Network

SHI Jiangyi,WEN Cong,LIU Hongjin,WANG Zekun,ZHANG Shaolin,MA Peijun,LI Kang
DOI: https://doi.org/10.11999/JEIT221201
2023-01-01
Abstract:The globalization of the Integrated Circuit(IC) supply chain has shifted most design, manufacturing, and testing processes from a single trusted entity to a variety of untrusted third-party entities in various parts of the world. The use of untrusted Third-Party Intellectual Property(3PIP) can expose a design to significant risk of having Hardware Trojans(HTs) implanted by adversaries. These hardware trojans may cause degradation of the original design, information leakage, or even irreversible damage at the physical level, seriously jeopardizing consumer privacy, security, and company reputation. Various hardware trojan detection approaches proposed in the existing literature have the following drawbacks: the reliance on golden reference model, the requirement for test vector coverage and even the need for manual code review. At the same time, with the increase of the scale of integrated circuits, the hardware trojans with low trigger rate are more difficult to be detected Therefore, to address the above problems, a graph neural network-based HT detection method is proposed that enables the detection of gate-level hardware trojans without the need for golden reference model as well as logic tests. Graph Sample and AGgrEgate(GraphSAGE) is used to learn the high-dimensional graph features in the gate-level netlist and the attributed node features. Then supervised learning is employed for the training of the detection model. The detection capability of models with different aggregation methods and data balancing methods are explored. An average recall of 92.9% and an average F1 score of 86.2% under the evaluation of the Synopsys 90 nm generic library(SAED) based benchmark in Trust-Hub are achieved by the model, which is an 8.4% improvement in F1 score compared to state of the art. When applied to the dataset with larger data volume based on 250 nm generic library(LEDA), the average recall and F1 of combined logic type are 83.6% and 70.8% respectively, and the average recall and F1 score of timing logic type are 95.0% and 92.8% respectively.
What problem does this paper attempt to address?