具有跨尺度Transformer的高效多视图立体网络

Efficient Multi-View Stereo Network with Cross-Scale Transformer

下载PDF

导出

摘要现有深度多视图立体(MVS)方法将Transformer引入级联网络,以实现高分辨率深度估计,从而实现高精确度和完整度的三维重建结果。然而,基于Transformer的方法受计算成本的限制,无法扩展到更精细的阶段。为此,提出一种新颖的跨尺度Transformer的MVS网络,在不增加额外计算的情况下处理不同阶段的特征表示。引入一种自适应匹配感知Transformer(AMT),在多个尺度上使用不同的交互式注意力组合。这种组合策略使所提网络能够捕捉图像内部的上下文信息,并增强图像之间的特征关系。此外,设计双特征引导聚合(DFGA),将粗糙的全局语义信息嵌入到更精细的代价体构建中,以进一步增强全局和局部特征的感知。同时,通过设计一种特征度量损失,用于评估变换前后的特征偏差,以减少特征错误匹配对深度估计的影响。实验结果表明,在DTU数据集中,所提网络的完整度和整体度量达到0.264、0.302,在Tanks and temples 2个大场景的重建平均值分别达到64.28、38.03。 At present,deep Multi-View Stereo(MVS)methods widely introduce Transformers into cascade networks to achieve high-resolution depth estimation,thereby ensuring highly accurate and complete 3D reconstruction results.However,Transformer-based methods are limited by their computational costs and cannot be extended to more refined stages.To solve this problem,this paper proposes a novel cross-scale Transformer-based MVS network that can manage feature representations at different stages without incurring additional computation.In particular,this study introduces an Adaptive Matching-aware Transformer(AMT),which uses different interactive attention combinations on multiple scales,enabling the proposed network to capture contextual information within images and enhance the feature relationships between images.In addition,this study proposes Dual Feature Guided Aggregation(DFGA)to embed coarse global semantic information into finer cost body construction,further enhancing the perception of global and local features.Simultaneously,a feature metric loss is designed to evaluate feature deviation before and after the Transformation and thereby reduce the impact of feature mismatch on depth estimation.Experimental results show that the integrity and overall measurements of the proposed network are 0.264 and 0.302 on the DTU dataset,respectively.The average reconstruction values for Tank and temples scenarios are 64.28 and 38.03,respectively.

作者王思成江浩陈晓 WANG Sicheng;JIANG Hao;CHEN Xiao(School of Artificial Intelligence(School of Future Technology),Nanjing University of Information Science and Technology,Nanjing 210044,Jiangsu,China;National Mobile Communications Research Laboratory,Southeast University,Nanjing 210096,Jiangsu,China)

机构地区南京信息工程大学人工智能学院(未来科技学院) 东南大学移动通信国家重点实验室

出处《计算机工程》 CAS CSCD 北大核心 2024年第11期266-275,共10页 Computer Engineering

基金国家自然科学基金(62101273) 东南大学移动通信国家重点实验室开放研究基金资助(2022D10)。

关键词多视图立体特征匹配 Transformer网络注意力机制三维重建 Multi-View Stereo(MVS) feature matching Transformer network attention mechanism 3D reconstruction

分类号 TP391 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

1问静.“双碳”视角下西安市充电基础设施建设影响因素与组合策略[J].车迷,2024(7):0057-0059.
2本悟法师.本悟法师:坚持全面从严治教坚定文化自信[J].法音,2024(8):24-24.
3张波,赵云鹏.基于渐进式多光谱图像的行人检测算法[J].计算机工程与设计,2024,45(10):3042-3050.
4李诚,张道锋,郑小鹏,刘燕,胡薇薇,魏金阳,石林辉.鄂尔多斯盆地横山气田下二叠统太原组灰岩储层形成机理及展布规律[J].海相油气地质,2024,29(3):257-268.
5王东亮,罗雨舟.基于联合分析法的高校图书馆阅读推广组合策略研究[J].无锡职业技术学院学报,2024,23(5):39-45.
6李志猛,廖伟文,洪学武,张龙,钟文,赵坚.基于改进视图聚类的装配式建筑构件识别方法[J].河北工程大学学报（自然科学版）,2024,41(5):8-15.
7仇栋才.基于深度学习的多视角三维重建算法研究[J].微型计算机,2024(10):10-12.
8马名川,赵少迪,胡传伟,刘璋,张丽君,刘龙龙.基于表型性状和品质性状的苦荞核心种质构建[J].植物遗传资源学报,2024,25(10):1637-1647.
9曲熠,陈莹.基于尺度线索增强的无监督单目深度估计[J].电子学报,2024,52(9):3217-3227.
10白武尚,何秋生,王凯,曹京威.结合注意力机制的轻量化人脸表情识别方法[J].太原科技大学学报,2024,45(5):474-479.

计算机工程

2024年第11期

浏览历史

内容加载中请稍等...

具有跨尺度Transformer的高效多视图立体网络

相关作者

相关机构

相关主题

浏览历史