ST-WebAgentBench: A Benchmark for Evaluating Safety and Trustworthiness in Web Agents

Ido Levy,Ben Wiesel,Sami Marreed,Alon Oved,Avi Yaeli,Segev Shlomov
2024-10-10
Abstract:Recent advancements in LLM-based web agents have introduced novel architectures and benchmarks showcasing progress in autonomous web navigation and interaction. However, most existing benchmarks prioritize effectiveness and accuracy, overlooking crucial factors like safety and trustworthiness which are essential for deploying web agents in enterprise settings. The risks of unsafe web agent behavior, such as accidentally deleting user accounts or performing unintended actions in critical business operations, pose significant barriers to widespread adoption. In this paper, we present ST-WebAgentBench, a new online benchmark specifically designed to evaluate the safety and trustworthiness of web agents in enterprise contexts. This benchmark is grounded in a detailed framework that defines safe and trustworthy (ST) agent behavior, outlines how ST policies should be structured and introduces the Completion under Policies metric to assess agent performance. Our evaluation reveals that current SOTA agents struggle with policy adherence and cannot yet be relied upon for critical business applications. Additionally, we propose architectural principles aimed at improving policy awareness and compliance in web agents. We open-source this benchmark and invite the community to contribute, with the goal of fostering a new generation of safer, more trustworthy AI agents. All code, data, environment reproduction resources, and video demonstrations are available at <a class="link-external link-https" href="https://sites.google.com/view/st-webagentbench/home" rel="external noopener nofollow">this https URL</a>.
Artificial Intelligence
What problem does this paper attempt to address?
### Problems the paper attempts to solve This paper aims to solve the security and trustworthiness issues that current web agents face when deployed in enterprise environments. Specifically, existing benchmark tests mainly focus on the effectiveness and accuracy of task completion, while ignoring these two crucial factors of security and trustworthiness. These issues include: 1. **Security issues**: - Web agents may perform dangerous operations, such as accidentally deleting user accounts or performing unauthorized actions during critical business operations. - Agents may pose a leakage risk when handling sensitive data. - Agents need to be able to resist adversarial inputs and sensor manipulations to prevent "jailbreaking". 2. **Trustworthiness issues**: - Agents need to follow organizational policies, user preferences, and task instructions, and these policies and instructions need to be executed in order of priority. - Agents need to obtain explicit permission from users when performing tasks, especially when conducting irreversible operations. - Agents need to stop operating when exceeding the authorized scope to avoid invading privacy and violating regulations. - Agents need to provide a clear fallback mechanism when handling errors to ensure that no chain problems are triggered in case of task failure. - Agents need to abide by legal and ethical standards to prevent bias and discrimination. - Agents need to provide transparency and explainability to let users understand their decision - making processes. ### Solutions To address the above issues, the author proposes **ST - WebAgentBench**, a new online benchmarking framework specifically designed to evaluate the security and trustworthiness of web agents in enterprise environments. The main features of this benchmark test include: 1. **Detailed framework**: - Defines the standards for secure and trustworthy (ST) agent behavior. - Describes how to structure secure and trustworthy policies. - Introduces the **Completion under Policies (CuP)** metric to evaluate the performance of agents in multiple dimensions. 2. **Evaluation results**: - Evaluates the performance of the current state - of - the - art agents in the new benchmark test and finds that these agents have obvious deficiencies in policy compliance and cannot operate reliably in critical business applications. - Proposes architectural principles aimed at improving agent policy awareness and compliance. 3. **Open - source contribution**: - Makes this benchmarking framework open - source and invites community contributions to jointly promote the development of a new generation of safer and more trustworthy AI agents. Through these measures, the author hopes to bridge the gap between current capabilities and the requirements of enterprise environments and promote the safe and reliable deployment of web agents in practical application scenarios.