Abstract:In this article, a decentralized optimal tracking control problem has been studied for a large-scale autonomous vehicle system with heterogeneous system dynamics. Due to the ultralarge number of agents, the notorious "curse of dimension" problem as well as the unrealistic assumption of the existence of reliable very large-scale communication links in uncertain environments have challenged the traditional multiagent system (MAS) algorithms for decades. The emerging mean-field game (MFG) theory has recently been widely adopted to generate a decentralized control method that deals with those challenges by encoding the large scale MASs' information into a novel time-varying probability density functions (PDF) which can be obtained locally. However, the traditional MFG methods assume all agents are homogeneous, which is unrealistic in practical industrial applications, e.g., Internet of Things (IoTs), and so on. Therefore, a novel mean-field Stackelberg game (MFSG) is formulated based on the Stackelberg game, where all the agents have been classified as two different categories where one major leader's decision dominates the other minor agents. Moreover, a hierarchical structure that treats all minor agents as a mean-field group is developed to tackle the assumption of homogeneous agents. Then, the actor-actor–critic–critic-mass ( <span class="mjpage"><svg xmlns:xlink="http://www.w3.org/1999/xlink" width="8.092ex" height="2.509ex" style="vertical-align: -0.338ex;" viewBox="0 -934.9 3483.9 1080.4" role="img" focusable="false" xmlns="http://www.w3.org/2000/svg"><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)"> <use xlink:href="#MJMATHI-41" x="0" y="0"></use> <use transform="scale(0.707)" xlink:href="#MJMAIN-32" x="1061" y="513"></use><g transform="translate(1204,0)"> <use xlink:href="#MJMATHI-43" x="0" y="0"></use> <use transform="scale(0.707)" xlink:href="#MJMAIN-32" x="1094" y="513"></use></g> <use xlink:href="#MJMATHI-4D" x="2432" y="0"></use></g></svg></span> ) algorithm with five neural networks is designed to learn the optimal policies by solving the MFSG. The Lyapunov theory is utilized to prove the convergence of <span class="mjpage"><svg xmlns:xlink="http://www.w3.org/1999/xlink" width="8.092ex" height="2.509ex" style="vertical-align: -0.338ex;" viewBox="0 -934.9 3483.9 1080.4" role="img" focusable="false" xmlns="http://www.w3.org/2000/svg"><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)"> <use xlink:href="#MJMATHI-41" x="0" y="0"></use> <use transform="scale(0.707)" xlink:href="#MJMAIN-32" x="1061" y="513"></use><g transform="translate(1204,0)"> <use xlink:href="#MJMATHI-43" x="0" y="0"></use> <use transform="scale(0.707)" xlink:href="#MJMAIN-32" x="1094" y="513"></use></g> <use xlink:href="#MJMATHI-4D" x="2432" y="0"></use></g></svg></span> neural networks and the closed-loop system's stability. Finally, a series of numerical simulations are conducted to demonstrate the effectiveness of the developed method.<svg xmlns="http://www.w3.org/2000/svg" style="display: none;"><defs id="MathJax_SVG_glyphs"><path stroke-width="1" id="MJMATHI-41" d="M208 74Q208 50 254 46Q272 46 272 35Q272 34 270 22Q267 8 264 4T251 0Q249 0 239 0T205 1T141 2Q70 2 50 0H42Q35 7 35 11Q37 38 48 46H62Q132 49 164 96Q170 102 345 401T523 704Q530 716 547 716H555H572Q578 707 578 706L606 383Q634 60 636 57Q641 46 701 46Q726 46 726 36Q726 34 723 22Q720 7 718 4T704 0Q701 0 690 0T651 1T578 2Q484 2 455 0H443Q437 6 437 9T439 27Q443 40 445 43L449 46H469Q523 49 533 63L521 213H283L249 155Q208 86 208 74ZM516 260Q516 271 504 416T490 562L463 519Q447 492 400 412L310 260L413 259Q516 259 516 260Z"></path><path stroke-width="1" id="MJMAIN-32" d="M109 429Q82 429 66 447T50 491Q50 562 103 614T235 666Q326 666 387 610T449 465Q449 422 429 383T381 315T301 241Q265 210 201 149L142 93L218 92Q375 92 385 97Q392 99 409 186V189H449V186Q448 183 436 95T421 3V0H50V19V31Q50 38 56 46T86 81Q115 113 136 137Q145 147 170 174T204 211T233 244T261 278T284 308T305 340T320 369T333 401T340 431T343 464Q343 527 309 573T212 619Q179 619 154 602T119 569T109 550Q109 549 114 549Q132 549 151 535T170 489Q170 464 154 447T109 429Z"></path><path stroke-width="1" id="MJMATHI-43" d="M50 252Q50 367 117 473T286 641T490 704Q580 704 633 653Q642 643 648 636T656 626L657 623Q660 623 684 649Q691 655 699 663T715 679T725 690L740 705H746Q760 705 760 698Q760 694 728 561Q692 422 692 421Q690 416 687 415T669 413H653Q647 419 647 422Q647 423 648 429T650 449T651 481Q651 552 619 605T510 659Q484 659 454 652T382 628T299 572T226 479Q194 422 175 346T156 222Q156 108 232 58Q280 24 350 24Q441 24 512 92T606 240Q610 253 612 255T628 257Q648 257 648 248Q648 243 647 239Q618 132 523 55T319 -22Q206 -22 128 53T50 252Z"></path><path stroke-width="1" id="MJMATHI-4D" d="M289 629Q289 635 232 637Q208 637 201 638T194 648Q194 649 196 659Q197 662 198 666T199 671T201 676T203 679T207 681T212 683T220 683T232 684Q238 684 262 684T307 683Q386 683 398 683T414 678Q415 674 451 396L487 117L510 154Q534 190 574 254T662 394Q837 673 839 675Q840 676 842 678T846 681L852 683H948Q965 683 988 683T1017 684Q1051 684 1051 673Q1051 668 1048 656T1045 643Q1041 637 1008 637Q968 636 957 634T939 623Q936 618 867 340T797 59Q797 55 798 54T805 50T822 48T855 46H886Q892 37 892 35Q892 19 885 5Q880 0 869 0Q864 0 828 1T736 2Q675 2 644 2T609 1Q592 1 592 11Q592 13 594 25Q598 41 602 43T625 46Q652 46 685 49Q699 52 704 61Q706 65 742 207T813 490T848 631L654 322Q458 10 453 5Q451 4 449 3Q444 0 433 0Q418 0 415 7Q413 11 374 317L335 624L267 354Q200 88 200 79Q206 46 272 46H282Q288 41 289 37T286 19Q282 3 278 1Q274 0 267 0Q265 0 255 0T221 1T157 2Q127 2 95 1T58 0Q43 0 39 2T35 11Q35 13 38 25T43 40Q45 46 65 46Q135 46 154 86Q158 92 223 354T289 629Z"></path></defs></svg>

Decentralized Optimal Tracking Control for Large-scale Multi-Agent Systems under Complex Environment: A Constrained Mean Field Game with Reinforcement Learning Approach

Large-Scale Multiagent System Tracking Control Using Mean Field Games

Moving Forward in Formation: A Decentralized Hierarchical Learning Approach to Multi-Agent Moving Together

Mean Field Game and Decentralized Intelligent Adaptive Pursuit Evasion Strategy for Massive Multi-Agent System under Uncertain Environment

Discrete-Time Mean Field Control with Environment States

Decentralized optimal large scale multi-player pursuit-evasion strategies: A mean field game approach with reinforcement learning

Hierarchical game theoretical distributed adaptive control for large scale multi‐group multi‐agent system

Distributed Adaptive Flocking Control for Large-Scale Multiagent Systems

Adaptive Distributed Tracking Control for Markov Jump Multiagent Systems with a Non-Strict Leader

Optimal couple-group tracking control for the heterogeneous multi-agent systems with cooperative-competitive interactions via reinforcement learning method

Decentralized Adaptive Optimal Tracking Control for Massive Autonomous Vehicle Systems With Heterogeneous Dynamics: A Stackelberg Game

Differential graphical game‐based multi‐agent tracking control using integral reinforcement learning

A Multi-Population Mean-Field Game Approach for Large-Scale Agents Cooperative Attack-Defense Evolution in High-Dimensional Environments

Optimal Tracking Control of Nonlinear Multiagent Systems Using Internal Reinforce Q-Learning

MFC-EQ: Mean-Field Control with Envelope Q-Learning for Moving Decentralized Agents in Formation

Practical Optimal Formation-Containment Tracking Control of Nonlinear Multiagent Systems With Unknown Dynamics

Graphon Mean-Field Control for Cooperative Multi-Agent Reinforcement Learning

Group Formation Tracking of Heterogeneous Multi-Agent Systems Using Reinforcement Learning

Leader-Following Formation Tracking of Multiagent Systems Using Adaptive Scaling Mechanism under Spatial Constraints

Efficient off‐policy Q‐learning for multi‐agent systems by solving dual games