Abstract:Using depth estimation joint target detection networks to locate targets in the UAV field of view is a novel application in the depth estimation research field. The presence of more depth variations and low-texture regions in the ultra-low altitude oblique photographic images make them trickier to train for an excellent depth estimation network compared to autonomous driving scenarios. This presents a challenge in achieving optimal training. This study investigates the problem of unsupervised monocular depth estimation for ultra-low altitude oblique photography images. It aims to make subsequent advanced vision tasks better benefit from excellent depth estimation results in terms of overcoming complex scenes. The lack of effective back-projection directionality in training using adjacent frames is attributed to the extensive low-textured areas contained in the training data for complex ultra-low altitude oblique photography. We propose a self-supervised scene-aware refinement learning architecture from the perspective of enhancing feature perception to deal with such problems. The architecture consists of a multi-resolution feature fusion depth network and a perceptual refinement network (PRNet), together with a pose network to enhance regional differences in complex environments from a refined feature context perspective to obtain higher quality depth maps. We rethink the problem of depth information recovery and design the edge information aggregation (EIA) module, which is configured in the decoder section to refine the local region depth detail representation. We design several loss terms to constrain the training of the network in order to improve the quality of depth estimation. Our method is compared with six state-of-the-art self-supervised monocular depth estimation methods on three datasets (UAVid 2020, WildUAV, UAV ula). The experimental results demonstrate that our model achieves the best performance in most scenarios. The code and the private dataset (UAV ula) can be publicly available at https://github.com/takisu0916/MRFEDepth .

Alleviating Spatial Misalignment and Motion Interference for UAV-based Video Recognition

Multilevel Spatial-Temporal Feature Aggregation for Video Object Detection

Low in Resolution, High in Precision: UAV Detection with Super-Resolution and Motion Information Extraction

Flow-Guided Single Object Tracking Framework in UAV Aerial Video

Adaptive Switching Spatial-Temporal Fusion Detection for Remote Flying Drones

MITFAS: Mutual Information based Temporal Feature Alignment and Sampling for Aerial Video Action Recognition

Adaptive Feature Fusion and Improved Attention Mechanism-Based Small Object Detection for UAV Target Tracking

Object Tracking in Unmanned Aerial Vehicle Videos via Multifeature Discrimination and Instance-Aware Attention Network

Learnable Cross-Scale Sparse Attention Guided Feature Fusion for UAV Object Detection

UCDNet: Multi-UAV Collaborative 3D Object Detection Network by Reliable Feature Mapping

Self-Attention Guidance and Multiscale Feature Fusion-Based UAV Image Object Detection

A Real-Time Incremental Video Mosaic Framework for UAV Remote Sensing.

FuTH-Net: Fusing Temporal Relations and Holistic Features for Aerial Video Classification

Motion Matters: Difference-based Multi-scale Learning for Infrared UAV Detection

AZTR: Aerial Video Action Recognition with Auto Zoom and Temporal Reasoning

Enhancing UAV Detection in Surveillance Camera Videos through Spatiotemporal Information and Optical Flow

Modality Meets Long-Term Tracker: A Siamese Dual Fusion Framework for Tracking UAV

Full-Scale Feature Aggregation and Grouping Feature Reconstruction-Based UAV Image Target Detection

Scene-aware refinement network for unsupervised monocular depth estimation in ultra-low altitude oblique photography of UAV

A Study of Efficient Maritime Rescue Identification Algorithms

A Vehicle-Mounted Radar-Vision System for Precisely Positioning Clustering UAVs