期刊文献+

基于自蒸馏和双模态的室内场景解析算法

Indoor scene parsing method based on self-distillation and dual-mode
下载PDF
导出
摘要 【目的】为了使室内机器人能准确地识别室内不同类别的物体,从而选择更安全可行的路线,提出一种用于室内场景解析的基于自蒸馏和双模态的自蒸馏多级级联网络(self-distillation multi-stage cascaded network,SMCNet)。【方法】首先,使用分割变换器(segmentation transformer,SegFormer)作为骨干网络以双流的方式分别提取三色图(red green blue,RGB)和深度图中的特征信息,得到4组特征输出;其次,设计了特征增强模块(feature enhancement module,FEM),将这四组特征进行特征增强后分组融合,以充分提取双模态特征中的有用信息并充分交融;最后,设计了自蒸馏监督模块(self-distillation supervision module,SSM),通过自蒸馏方法将高层特征中的有价值信息传递到低层特征中,并设计了多级级联监督模块(multi-stage cascaded supervision module,MCSM)进行跨层监督,得到最终的预测图。【结果】在室内场景双模态数据集纽约大学深度版本2(New York University Depth version 2,NYUDv2)和场景理解彩色-深度(scene understanding red green blue-depth,SUN RGB-D)上,相比已有的方法,本研究提出的模型在相同条件下得到的结果超过其他方法,均值交并比(mean intersection over union,MIoU)在NYUDv2和SUN RGB-D两个数据集上分别达到了57.3%和53.1%。【结论】SMCNet能比较准确地解析出室内场景中不同类别的物体,可为室内机器人获取室内视觉信息提供一定的技术支撑。 [Objective]In order to enable indoor robots to accurately identify different types of objects indoors,so as to choose a safer and more feasible route.A self-distillation multi-stage cascaded network(SMCNet)was proposed for indoor scene parsing on the basis of self-distillation and dual-mode.[Method]First,a segmentation transformer(SegFormer)was used as the backbone network to extract the feature information in the three-color map(red green blue,RGB)and depth map respectively in a two-stream way,obtaining four groups of feature outputs;second,a feature enhancement module(FEM)was designed to fuse the four features in groups after feature enhancement,so as to fully extract useful information from the dual-mode features and fully blend them;finally,a self-distillation supervision module(SSM)was designed to convey valuable information from high-level features to low-level features by the self-distillation method.A multi-stage cascaded supervision module(MCSM)was designed for cross-layer supervision,obtaining the final prediction chart.[Result]On the New York University Depth version 2(NYUDv2)data set and scene understanding red green blue-depth(SUN RGB-D)data set,compared with the existing methods,the results of the proposed model exceed those of other methods under the same conditions as the mean intersection over union(MIoU)reaches 57.3%and 53.1%on NYUDv2 and SUN RGB-D datasets,respectively.[Conclusion]SMCNet can accurately analyze different types of objects in indoor scenes,so as to provide certain technical support for indoor robots to obtain indoor visual information.
作者 张喻铭 周武杰 叶绿 ZHANG Yuming;ZHOU Wujie;YE Lü(School of Information and Electronic Engineering,Zhejiang University of Science and Technology,Hangzhou 310023,Zhejiang,China)
出处 《浙江科技学院学报》 CAS 2024年第3期218-227,270,共11页 Journal of Zhejiang University of Science and Technology
基金 国家重点研发计划项目(2022YFEO196000) 国家自然科学基金项目(62371422)。
关键词 室内场景解析 自蒸馏 多级级联 双模态 indoor scene parsing self-distillation multi-stage cascaded dual-mode
  • 相关文献

参考文献4

二级参考文献14

共引文献12

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部