Promoting or Hindering: Stealthy Black-Box Attacks Against DRL-Based Traffic Signal Control

Yan Ren,Heng Zhang,Xianghui Cao,Chaoqun Yang,Jian Zhang,Hongran Li
DOI: https://doi.org/10.1109/jiot.2023.3308260
IF: 10.6
2024-01-01
IEEE Internet of Things Journal
Abstract:Numerous studies have demonstrated, in-depth, the vulnerability of the deep reinforcement learning (DRL) model’s elements (e.g., reward), which is a factor limiting the widespread deployment of DRL in some crucial domains, including intelligent traffic signal control (ITSC). While partial poisoning attacks with insidious rewards are enabled undetectable by directly employing regularization or cumulative reward restrictions, these constraints are somewhat one-dimensional and fail to consider the time dependence of DRL. Moreover, the adversary should avoid injecting undesirable perturbations when agents’ policies are unstable, namely effectively maximizing the attacking strategy’s benefit. It is thus a challenge to perturb the DRL model stealthily with as few disruption steps or modifications to the original sample as possible while ensuring the attack’s efficiency. In this work, two black-box reward space attack strategies are introduced, where we encourage the adversary to learn a malicious adversarial policy actively. The first is Multi Constraint Stealthy-time Attack which is updated with the penalties earned by attacking crucial moments, and restricted through action confidence and perturbations’ total number, to ensure attack times’ stealthiness. The second technique is Multi Objective Stealthy-modification Attack which is modeled as a multi-objective optimization problem, and the adversary balance attack performance and stealthy modification with weighting factor ω. Extensive simulation results evaluated in SUMO, involving comparison assessment and attack distribution, exhibit a dramatic increase in average travel time, implying that our attacks impose pressure on the traffic flow, namely the efficacy of proposed attack strategies.
What problem does this paper attempt to address?