Shared Control with Black Box Agents using Oracle Queries

Inbal Avraham,Reuth Mirsky
2024-10-25
Abstract:Shared control problems involve a robot learning to collaborate with a human. When learning a shared control policy, short communication between the agents can often significantly reduce running times and improve the system's accuracy. We extend the shared control problem to include the ability to directly query a cooperating agent. We consider two types of potential responses to a query, namely oracles: one that can provide the learner with the best action they should take, even when that action might be myopically wrong, and one with a bounded knowledge limited to its part of the system. Given this additional information channel, this work further presents three heuristics for choosing when to query: reinforcement learning-based, utility-based, and entropy-based. These heuristics aim to reduce a system's overall learning cost. Empirical results on two environments show the benefits of querying to learn a better control policy and the tradeoffs between the proposed heuristics.
Artificial Intelligence,Robotics
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: in the shared control system, how to improve the efficiency and accuracy of robot - human collaborative learning by introducing a query mechanism. Specifically, this research focuses on the collaborative process between the robot (control end) and the human (black - box agent), by directly querying the cooperative agent for the best action suggestions to reduce the learning time and improve the overall performance of the system. ### Research Background In many modern systems, such as surgical robots, semi - autonomous vehicles, and brain - computer interfaces, humans and robots need to work together to complete tasks. One of the core challenges in the shared control system is that the robot's strategy cannot directly obtain the human behavior patterns, which are regarded as "black - boxes". Existing solutions usually assume that the learner can only interact with the black - box by performing actions, but this limitation may lead to low learning efficiency. ### Main Problems 1. **Introducing the Query Mechanism**: This paper proposes a new framework that allows the control end (robot) to obtain the best action suggestions by querying the cooperative agent (human). This additional information channel can significantly accelerate the learning speed and improve the system's accuracy. 2. **When to Query**: After introducing the query mechanism, a key question is when to perform the query. The query itself has a cost, so effective strategies need to be designed to decide when to use the query to minimize the number of queries and maintain high performance. ### Solutions To solve the above problems, this paper makes the following contributions: 1. **Formally Defining the Shared Control Problem with a Query Mechanism**: Based on the multi - agent Markov decision process (MA - MDP), the state space is divided into visible and invisible parts, thus extending the hidden - mode MDP to the multi - agent environment. 2. **Introducing Two Types of Oracles**: - **Teacher Oracle**: Familiar with the global information of the entire system and can provide optimal action suggestions. - **Expert Oracle**: Only familiar with the behavior of the black - box agent and predicts its next best action. 3. **Proposing Three Query Heuristic Methods**: - **Entropy Heuristic**: Decides whether to query based on the information gain (IG), and chooses to query when the information gain exceeds a certain threshold. - **Utility Heuristic**: Decides whether to query according to the prediction probability of the RNN. If the probability of an action is higher than the set threshold, no query is made. - **Reinforcement Learning Heuristic**: Based on the Q - learning algorithm, decides whether to query by estimating the value of querying and not querying. ### Experimental Verification This research has carried out experimental verification in two fields: 1. **Automaton Field**: Three main use cases (Cases, Strategy, Combination Lock) are designed to show the performance of different heuristic methods in dealing with sparse or delayed rewards. 2. **Lunar Lander Simulator**: The effectiveness of the heuristic methods is tested in a more complex environment. The results show that the query mechanism significantly improves the learning efficiency and reduces the failure rate. ### Conclusion By introducing the query mechanism and designing reasonable heuristic strategies, this paper successfully improves the performance of the shared control system, reduces the learning time, and shows the application potential in complex tasks.