Making 'syscall' a Privilege not a Right

Fangfei Yang,Anjo Vahldiek-Oberwagner,Chia-Che Tsai,Kelly Kaoudis,Nathan Dautenhahn
2024-06-12
Abstract:Browsers, Library OSes, and system emulators rely on sandboxes and in-process isolation to emulate system resources and securely isolate untrusted components. All access to system resources like system calls (syscall) need to be securely mediated by the application. Otherwise system calls may allow untrusted components to evade the emulator or sandbox monitor, and hence, escape and attack the entire application or system. Existing approaches, such as ptrace, require additional context switches between kernel and userspace, which introduce high performance overhead. And, seccomp-bpf supports only limited policies, which restricts its functionality, or it still requires ptrace to provide assistance. In this paper, we present nexpoline, a secure syscall interception mechanism combining Memory Protection Keys (MPK) and Seccomp or Syscall User Dispatch (SUD). Our approach transforms an application's syscall instruction into a privilege reserved for the trusted monitor within the address space, allowing flexible user defined policy. To execute a syscall, the application must switch contexts via nexpoline. It offers better efficiency than secure interception techniques like ptrace, as nexpoline can intercept syscalls through binary rewriting securely. Consequently, nexpoline ensures the safety, flexibility and efficiency for syscall interception. Notably, it operates without kernel modifications, making it viable on current Linux systems without needing root privileges. Our benchmarks demonstrate improved performance over ptrace in interception overhead while achieving the same security guarantees. When compared to similarly performing firejail, nexpoline supports more complex policies and enables the possibility to emulate system resources.
Cryptography and Security
What problem does this paper attempt to address?