摘要
对于三维物体的识别任务,基于多视图卷积神经网络的方法(MVCNN)在准确性和训练速度等方面都优于基于三维数据表示的方法。但MVCNN依赖于三维模型,且采用了固定视角的视图,不符合实际的应用场景;此外,其视图特征融合采用了最大值池化操作,会损失部分原始特征信息。针对这一问题,该文提出了一种基于多视图循环神经网络(MVRNN)的三维物体识别方法,从3个方面对MVCNN进行改进。首先,在交叉熵损失函数中引入特征辨识度指标,以提高不同物体特征之间的辨识度;其次,使用循环神经网络代替MVCNN的最大值池化操作来融合多个自由视觉视图特征,得到一个更加紧凑且物体外观信息完备的融合特征;最后,利用二分类网络对自由视角单视图特征和融合特征进行匹配,实现三维物体的细粒度识别。为了验证MVRNN的性能,分别在公开数据集ModelNet和自建数据集MV3D上进行对比实验。实验结果表明,与MVCNN相比,MVRNN提取的多视图特征具有更高的辨识度,在两个数据集上的识别准确率均较有明显提升。
Multi-view convolutional neural networks(MVCNN)is more accurate and faster than those methods based on state-of-the-art 3D shape descriptors in 3D object recognition tasks.However,the input of MVCNN are views rendered from cameras at fixed positions,which is not the case of most applications.Furthermore,MVCNN uses max-pooling operation to fuse multi-view features and the information of original features may be lost.To address those two problems,a new recognition method of 3D objects based on multi-view recurrent neural networks(MVRNN)is proposed based on MVCNN with improvements on three aspects.First,a new item which is defined as the measure of discrimination is introduced into the cross-entropy loss function to enhance the discrimination of features from different objects.Second,a recurrent neural networks(RNN)is used to fuse multi-view features from free positions into a compact one,instead of the max-pooling operation in MVCNN.RNN can keep the completeness of information about appearance feature.At last,single view feature from free positon is matched with fused features via a bi-classification network to attain fine-grained recognition of 3D objects.Experiments are conducted on the open dataset ModelNet and the private dataset MV3D separately to validate the performance of MVRNN.The results show that MVRNN can exact multi-view features with higher degree of discrimination,and achieve higher accuracy than MVCNN on both datasets.
作者
董帅
李文生
张文强
邹昆
DONG Shuai;LI Wen-sheng;ZHANG Wen-qiang;ZOU Kun(Zhongshan Institute,University of Electronic Science and Technology of China Zhongshan Guangdong 528406)
出处
《电子科技大学学报》
EI
CAS
CSCD
北大核心
2020年第2期269-275,共7页
Journal of University of Electronic Science and Technology of China
基金
国家青年科学基金(61502088)
广东省自然科学基金(2016A030313018)
广东省高等学校优秀青年教师培养计划(Yq2013206)。
关键词
三维物体
特征提取
特征融合
图像检索
多视图
3D object
feature extraction
feature fusion
image retrieval
multi-view