Abstract:In recent years collaborative robots have become major market drivers in industry 5.0, which aims to incorporate them alongside humans in a wide array of settings ranging from welding to rehabilitation. Improving human-machine collaboration entails using computational algorithms that will save processing as well as communication cost. In this study we have constructed an agent that can choose when to cooperate using an optimal strategy. The agent was designed to operate in the context of divergent interest tacit coordination games in which communication between the players is not possible and the payoff is not symmetric. The agent's model was based on a behavioral model that can predict the probability of a player converging on prominent solutions with salient features (e.g., focal points) based on the player's Social Value Orientation (SVO) and the specific game features. The SVO theory pertains to the preferences of decision makers when allocating joint resources between themselves and another player in the context of behavioral game theory. The agent selected stochastically between one of two possible policies, a greedy or a cooperative policy, based on the probability of a player to converge on a focal point. The distribution of the number of points obtained by the autonomous agent incorporating the SVO in the model was better than the results obtained by the human players who played against each other (i.e., the distribution associated with the agent had a higher mean value). Moreover, the distribution of points gained by the agent was better than any of the separate strategies the agent could choose from, namely, always choosing a greedy or a focal point solution. To the best of our knowledge, this is the first attempt to construct an intelligent agent that maximizes its utility by incorporating the belief system of the player in the context of tacit bargaining. This reward-maximizing strategy selection process based on the SVO can also be potentially applied in other human-machine contexts, including multiagent systems.

Option-Critic in Cooperative Multi-agent Systems

Multi-agent Deep Covering Option Discovery

Learning Multiagent Options for Tabular Reinforcement Learning Using Factor Graphs

Option-based Multi-agent Exploration

Learning Multi-agent Skills for Tabular Reinforcement Learning using Factor Graphs

Multi-Agent Optimization and Learning: A Non-Expansive Operators Perspective

Algorithm for Automatic Constructing Option Based on Multi-Agent

Multi-agent Covering Option Discovery through Kronecker Product of Factor Graphs.

Cooperative Multi-Agent Constrained POMDPs: Strong Duality and Primal-Dual Reinforcement Learning with Approximate Information States

MACRPO: Multi-Agent Cooperative Recurrent Policy Optimization

Enhancing Multi-Agent Coordination through Common Operating Picture Integration

Consensus of Cooperative–antagonistic Multi-Agent Networks with Asynchronous Three-Option Decision Mechanism

Multi-agent Black-box Optimization using a Bayesian Approach to Alternating Direction Method of Multipliers

Using a Stochastic Agent Model to Optimize Performance in Divergent Interest Tacit Coordination Games

A Role-Based POMDPs Approach for Decentralized Implicit Cooperation of Multiple Agents.

Cooperative Bayesian Optimization for Imperfect Agents

Efficient Multiagent Planning via Shared Action Suggestions

Safe Option-Critic: Learning Safety in the Option-Critic Architecture

A Framework for Sequential Planning in Multi-Agent Settings

Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments

Decentralized Opportunistic Spectrum Resources Access Model and Algorithm Toward Cooperative Ad-Hoc Networks