A Decision-Language Model (DLM) for Dynamic Restless Multi-Armed Bandit Tasks in Public Health
Nikhil Behari,Edwin Zhang,Yunfan Zhao,Aparna Taneja,Dheeraj Nagaraj,Milind Tambe
2024-02-23
Abstract:Efforts to reduce maternal mortality rate, a key UN Sustainable Development
target (SDG Target 3.1), rely largely on preventative care programs to spread
critical health information to high-risk populations. These programs face two
important challenges: efficiently allocating limited health resources to large
beneficiary populations, and adapting to evolving policy priorities. While
prior works in restless multi-armed bandit (RMAB) demonstrated success in
public health allocation tasks, they lack flexibility to adapt to evolving
policy priorities. Concurrently, Large Language Models (LLMs) have emerged as
adept, automated planners in various domains, including robotic control and
navigation. In this paper, we propose DLM: a Decision Language Model for RMABs.
To enable dynamic fine-tuning of RMAB policies for challenging public health
settings using human-language commands, we propose using LLMs as automated
planners to (1) interpret human policy preference prompts, (2) propose code
reward functions for a multi-agent RL environment for RMABs, and (3) iterate on
the generated reward using feedback from RMAB simulations to effectively adapt
policy outcomes. In collaboration with ARMMAN, an India-based public health
organization promoting preventative care for pregnant mothers, we conduct a
simulation study, showing DLM can dynamically shape policy outcomes using only
human language commands as input.
Machine Learning,Artificial Intelligence,Multiagent Systems