Multi-agent action strategy learning method and device, medium and computing equipment

Huang Minlie,Takanobu Ryuichi
2020-01-01
Abstract:The embodiment of the invention provides a multi-agent action strategy learning method. The multi-agent action strategy learning method comprises the steps that multiple agents sample corresponding actions according to respective initial action strategies; respectively estimating the advantages obtained after the multiple agents execute the corresponding actions; and updating the action strategy of each intelligent agent based on the advantages obtained after the multiple intelligent agents execute the corresponding actions, so that each updated action strategy can enable the corresponding intelligent agent to obtain higher return. The method provided by the invention is applied to a task processing-oriented machine learning scene; meanwhile, a plurality of cooperative intelligent agents are trained (namely a plurality of action strategies are trained at the same time). A pre-built simulator and the intelligent agents are not adopted for interaction, manual supervision is not needed, time cost and resources are greatly saved, in addition, in order to enable all the intelligent agents to learn excellent action strategies, different awards are distributed to all the intelligent agents, and therefore the multiple intelligent agents can learn the more excellent action strategies.
What problem does this paper attempt to address?