期刊文献+

具有跨尺度Transformer的高效多视图立体网络

Efficient Multi-View Stereo Network with Cross-Scale Transformer
下载PDF
导出
摘要 现有深度多视图立体(MVS)方法将Transformer引入级联网络,以实现高分辨率深度估计,从而实现高精确度和完整度的三维重建结果。然而,基于Transformer的方法受计算成本的限制,无法扩展到更精细的阶段。为此,提出一种新颖的跨尺度Transformer的MVS网络,在不增加额外计算的情况下处理不同阶段的特征表示。引入一种自适应匹配感知Transformer(AMT),在多个尺度上使用不同的交互式注意力组合。这种组合策略使所提网络能够捕捉图像内部的上下文信息,并增强图像之间的特征关系。此外,设计双特征引导聚合(DFGA),将粗糙的全局语义信息嵌入到更精细的代价体构建中,以进一步增强全局和局部特征的感知。同时,通过设计一种特征度量损失,用于评估变换前后的特征偏差,以减少特征错误匹配对深度估计的影响。实验结果表明,在DTU数据集中,所提网络的完整度和整体度量达到0.264、0.302,在Tanks and temples 2个大场景的重建平均值分别达到64.28、38.03。 At present,deep Multi-View Stereo(MVS)methods widely introduce Transformers into cascade networks to achieve high-resolution depth estimation,thereby ensuring highly accurate and complete 3D reconstruction results.However,Transformer-based methods are limited by their computational costs and cannot be extended to more refined stages.To solve this problem,this paper proposes a novel cross-scale Transformer-based MVS network that can manage feature representations at different stages without incurring additional computation.In particular,this study introduces an Adaptive Matching-aware Transformer(AMT),which uses different interactive attention combinations on multiple scales,enabling the proposed network to capture contextual information within images and enhance the feature relationships between images.In addition,this study proposes Dual Feature Guided Aggregation(DFGA)to embed coarse global semantic information into finer cost body construction,further enhancing the perception of global and local features.Simultaneously,a feature metric loss is designed to evaluate feature deviation before and after the Transformation and thereby reduce the impact of feature mismatch on depth estimation.Experimental results show that the integrity and overall measurements of the proposed network are 0.264 and 0.302 on the DTU dataset,respectively.The average reconstruction values for Tank and temples scenarios are 64.28 and 38.03,respectively.
作者 王思成 江浩 陈晓 WANG Sicheng;JIANG Hao;CHEN Xiao(School of Artificial Intelligence(School of Future Technology),Nanjing University of Information Science and Technology,Nanjing 210044,Jiangsu,China;National Mobile Communications Research Laboratory,Southeast University,Nanjing 210096,Jiangsu,China)
出处 《计算机工程》 CAS CSCD 北大核心 2024年第11期266-275,共10页 Computer Engineering
基金 国家自然科学基金(62101273) 东南大学移动通信国家重点实验室开放研究基金资助(2022D10)。
关键词 多视图立体 特征匹配 Transformer网络 注意力机制 三维重建 Multi-View Stereo(MVS) feature matching Transformer network attention mechanism 3D reconstruction
  • 相关文献

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部