Abstract:In the domain of combat simulations, the training and deployment of deep reinforcement learning (RL) agents still face substantial challenges due to the dynamic and intricate nature of such environments. Unfortunately, as the complexity of the scenarios and available information increases, the training time required to achieve a certain threshold of performance does not just increase, but often does so exponentially. This relationship underscores the profound impact of complexity in training RL agents. This paper introduces a novel approach that addresses this limitation in training artificial intelligence (AI) agents using RL. Traditional RL methods have been shown to struggle in these high-dimensional, dynamic environments due to real-world computational constraints and the known sample inefficiency challenges of RL. To overcome these limitations, we propose a method of localized observation abstraction using piecewise linear spatial decay. This technique simplifies the state space, reducing computational demands while still preserving essential information, thereby enhancing AI training efficiency in dynamic environments where spatial relationships are often critical. Our analysis reveals that this localized observation approach consistently outperforms the more traditional global observation approach across increasing scenario complexity levels. This paper advances the research on observation abstractions for RL, illustrating how localized observation with piecewise linear spatial decay can provide an effective solution to large state representation challenges in dynamic environments.

What problem does this paper attempt to address?

This paper attempts to address the challenges faced by deep reinforcement learning (RL) agents in training and deployment within complex and dynamic combat simulation environments. Specifically, as the complexity of the scenario and the amount of available information increase, the training time required to reach a certain performance threshold often grows exponentially, making the training process impractically expensive and time - consuming. Traditional RL methods perform poorly in these high - dimensional, dynamic environments, mainly due to real - world computational resource limitations and the known sample inefficiency problem of RL. To solve these problems, the authors propose a new method: using piecewise - linear spatial decay for local observation abstraction. This method reduces computational requirements by simplifying the state space while retaining crucial spatial information, thereby enhancing AI training efficiency. The core contributions of the paper are as follows: 1. **Simplifying the state space**: Through local observation abstraction, the global observation space is compressed into a smaller 7×7 matrix, regardless of the actual game board size. This compression method can significantly reduce the computational burden. 2. **Retaining key information**: Despite the compression, it still maintains a detailed description of the critical parts of the current environment (such as the state of adjacent cells), ensuring that the agent can make optimal decisions within a local range. 3. **Improving training efficiency**: By reducing unnecessary information load, the agent can reach a higher performance level in a shorter time, especially in scenarios with high complexity. 4. **Experimental verification**: Experiments were conducted in the Atlatl combat simulation environment, and the results show that the local observation method consistently outperforms the traditional global observation method in scenarios with different levels of complexity. In summary, this paper aims to solve the problems of low training efficiency and high computational cost of deep reinforcement learning in complex combat simulation environments by introducing a new local observation abstraction method. This method not only improves training efficiency but also provides new ideas for future applications of reinforcement learning in larger and more dynamic environments. ### Formula Representation The formulas involved in this paper are mainly used to describe the piecewise - linear spatial decay function \( w(d) \), which is defined as follows: \[ w(d) = \begin{cases} 1 & \text{for } d \leq 3 \\ 1 - 0.9\times\frac{d - 3}{7 - 3} & \text{for } 3 < d < 7 \\ 0.1- 0.9\times\frac{d - 7}{100 - 7} & \text{for } 7 \leq d < 100 \\ 0.01 & \text{for } d \geq 100 \end{cases} \] This formula is used to determine the weight of each observation point, thereby achieving a smooth transition from global to local, ensuring effective compression and retention of information.

Localized Observation Abstraction Using Piecewise Linear Spatial Decay for Reinforcement Learning in Combat Simulations

S2RL: DoWe Really Need to Perceive All States in Deep Multi-Agent Reinforcement Learning?

S2RL: Do We Really Need to Perceive All States in Deep Multi-Agent Reinforcement Learning?

Mastering the Digital Art of War: Developing Intelligent Combat Simulation Agents for Wargaming Using Hierarchical Reinforcement Learning

Scaling Intelligent Agents in Combat Simulations for Wargaming

Applying Action Masking and Curriculum Learning Techniques to Improve Data Efficiency and Overall Performance in Operational Technology Cyber Security using Reinforcement Learning

State-Wise Safe Reinforcement Learning With Pixel Observations

Scaling intelligent agent combat behaviors through hierarchical reinforcement learning

Learning Complex Spatial Behaviours in ABM: An Experimental Observational Study

Just Round: Quantized Observation Spaces Enable Memory Efficient Learning of Dynamic Locomotion

Influence-Augmented Local Simulators: A Scalable Solution for Fast Deep RL in Large Networked Systems

Enhanced method for reinforcement learning based dynamic obstacle avoidance by assessment of collision risk

Learning Markov State Abstractions for Deep Reinforcement Learning

Active search and coverage using point-cloud reinforcement learning

Point Cloud Based Reinforcement Learning for Sim-to-Real and Partial Observability in Visual Navigation

Hybrid Information-driven Multi-agent Reinforcement Learning

Differentially Encoded Observation Spaces for Perceptive Reinforcement Learning

Decentralized Multi-Agent Reinforcement Learning with Global State Prediction

Context-Aware Safe Reinforcement Learning for Non-Stationary Environments

Zero-shot Policy Learning with Spatial Temporal RewardDecomposition on Contingency-aware Observation

A Survey On Enhancing Reinforcement Learning in Complex Environments: Insights from Human and LLM Feedback