Fast Best Viewpoint Selection with Geometry-Enhanced Multiple Views and Cross-Modal Distillation

Zidi Cao,Jiayi Han,Sipeng Yang,Xiaogang Jin
DOI: https://doi.org/10.1007/s00371-024-03708-5
IF: 2.835
2024-01-01
The Visual Computer
Abstract:Best viewpoint selection (BVS) aims to automatically identify the most informative and human preference viewpoints of 3D shapes, clearly conveying their complexity and structure. Despite advancements in mesh-based BVS using multi-views, the current state-of-the-art BVS method requires 20-30 rendered views and is limited to predefined viewpoints, which may miss optimal viewpoints and is impractical in time-sensitive scenarios. To address these limitations, we present a new dual-branch fast BVS regression model that significantly reduces reliance on extensive input views, enables continuous perspective prediction, and enhances interactive response speeds. Our method incorporates a geometry-enhanced multi-view feature extractor combined with a learnable token and employs a cross-modal distillation approach to deepen understanding of 3D structures. By integrating alignment constraints between 3D geometry descriptors and multi-view expressions, our approach minimizes the need for extensive rendering views, significantly reducing computational demands. Experimental results on public benchmarks show that our method is about 35 times faster than the state-of-the-art method when only six views are adopted, while also achieving the best quantitative metrics.
What problem does this paper attempt to address?