Abstract:In real-world scenarios, making navigation decisions for autonomous driving involves a sequential set of steps. These judgments are made based on partial observations of the environment, while the underlying model of the environment remains unknown. A prevalent method for resolving such issues is reinforcement learning, in which the agent acquires knowledge through a succession of rewards in addition to fragmentary and noisy observations. This study introduces an algorithm named deep reinforcement learning navigation via decision transformer (DRLNDT) to address the challenge of enhancing the decision-making capabilities of autonomous vehicles operating in partially observable urban environments. The DRLNDT framework is built around the Soft Actor-Critic (SAC) algorithm. DRLNDT utilizes Transformer neural networks to effectively model the temporal dependencies in observations and actions. This approach aids in mitigating judgment errors that may arise due to sensor noise or occlusion within a given state. The process of extracting latent vectors from high-quality images involves the utilization of a variational autoencoder (VAE). This technique effectively reduces the dimensionality of the state space, resulting in enhanced training efficiency. The multimodal state space consists of vector states, including velocity and position, which the vehicle's intrinsic sensors can readily obtain. Additionally, latent vectors derived from high-quality images are incorporated to facilitate the Agent's assessment of the present trajectory. Experiments demonstrate that DRLNDT may achieve a superior optimal policy without prior knowledge of the environment, detailed maps, or routing assistance, surpassing the baseline technique and other policy methods that lack historical data.

Racing with Vision Transformer Architecture

Evaluating Vision Transformer Methods for Deep Reinforcement Learning from Pixels

On Transforming Reinforcement Learning With Transformers: The Development Trajectory

Vision-based control in the open racing car simulator with deep and reinforcement learning

On Transforming Reinforcement Learning by Transformer: The Development Trajectory

Deep reinforcement learning navigation via decision transformer in autonomous driving

Vision Transformer: Vit and its Derivatives

Neural architecture impact on identifying temporally extended Reinforcement Learning tasks

ViTAE: Vision Transformer Advanced by Exploring Intrinsic Inductive Bias

Transformer Based Reinforcement Learning For Games

ViTAR: Vision Transformer with Any Resolution

Pre-trained Visual Dynamics Representations for Efficient Policy Learning

ViTCN: Vision Transformer Contrastive Network For Reasoning

ViR:the Vision Reservoir

ViSaRL: Visual Reinforcement Learning Guided by Human Saliency

Value-Consistent Representation Learning for Data-Efficient Reinforcement Learning

Pretrained Visual Representations in Reinforcement Learning

Improving Vision Transformers by Revisiting High-Frequency Components

RegionViT: Regional-to-Local Attention for Vision Transformers

SimViT: Exploring a Simple Vision Transformer with sliding windows

Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures