For some important object recognition applications such as intelligent robots and unmanned driving, images are collected on a consecutive basis and associated among themselves, besides, the scenes have steady prior fe...For some important object recognition applications such as intelligent robots and unmanned driving, images are collected on a consecutive basis and associated among themselves, besides, the scenes have steady prior features. Yet existing technologies do not take full advantage of this information. In order to take object recognition further than existing algorithms in the above application, an object recognition method that fuses temporal sequence with scene priori information is proposed. This method first employs YOLOv3 as the basic algorithm to recognize objects in single-frame images, then the DeepSort algorithm to establish association among potential objects recognized in images of different moments, and finally the confidence fusion method and temporal boundary processing method designed herein to fuse, at the decision level, temporal sequence information with scene priori information. Experiments using public datasets and self-built industrial scene datasets show that due to the expansion of information sources, the quality of single-frame images has less impact on the recognition results, whereby the object recognition is greatly improved. It is presented herein as a widely applicable framework for the fusion of information under multiple classes. All the object recognition algorithms that output object class, location information and recognition confidence at the same time can be integrated into this information fusion framework to improve performance.展开更多
Infrared scene simulation has extensive applications in military and civil fields. Based on a certain experimental environment,object-oriented graphics rendering engine( OGRE) is utilized to simulate a real three-di...Infrared scene simulation has extensive applications in military and civil fields. Based on a certain experimental environment,object-oriented graphics rendering engine( OGRE) is utilized to simulate a real three-dimensional infrared complex scene. First,the target radiation of each part is calculated based on our experimental data. Then through the analysis of the radiation characteristics of targets and related material,an infrared texture library is established and the 3ds Max software is applied to establish an infrared radiation model.Finally,a real complex infrared scene is created by using the OGRE engine image rendering technology and graphic processing unit( GPU) programmable pipeline technology. The results show that the simulation images are very similar to real images and are good supplements to real data.展开更多
Object-based audio coding is the main technique of audio scene coding. It can effectively reconstruct each object trajectory, besides provide sufficient flexibility for personalized audio scene reconstruction. So more...Object-based audio coding is the main technique of audio scene coding. It can effectively reconstruct each object trajectory, besides provide sufficient flexibility for personalized audio scene reconstruction. So more and more attentions have been paid to the object-based audio coding. However, existing object-based techniques have poor sound quality because of low parameter frequency domain resolution. In order to achieve high quality audio object coding, we propose a new coding framework with introducing the non-negative matrix factorization(NMF) method. We extract object parameters with high resolution to improve sound quality, and apply NMF method to parameter coding to reduce the high bitrate caused by high resolution. And the experimental results have shown that the proposed framework can improve the coding quality by 25%, so it can provide a better solution to encode audio scene in a more flexible and higher quality way.展开更多
场景图生成方法(SGG)主要研究图像中的实体及其关系,广泛应用于视觉理解与图像检索等领域。现有的场景图生成方法受限于视觉特征或单一视觉概念,导致关系识别准确率较低,且需要大量的人工标注。为解决上述问题,文中融合图像和文本特征,...场景图生成方法(SGG)主要研究图像中的实体及其关系,广泛应用于视觉理解与图像检索等领域。现有的场景图生成方法受限于视觉特征或单一视觉概念,导致关系识别准确率较低,且需要大量的人工标注。为解决上述问题,文中融合图像和文本特征,提出了一种基于多模态对比学习的场景图生成方法MCL-SG(Multimodal Contrastive Learning for Scene Graph)。首先,对图像和文本输入进行特征提取,得到图像和文本特征;然后,使用Transformer Encoder编码器对特征向量进行编码和融合;最后,采用对比学习的自监督策略,计算图像和文本特征的相似度,通过最小化正样本和负样本之间的相似度差异完成训练,无需人工标注。通过大型场景图生成公开数据集VG(Visual Genome)的3个不同层次子任务(即SGDet,SGCls和PredCls)的实验表明:在mean Recall@100指标中,MCL-SG的场景图检测准确率提升9.8%,场景图分类准确率提升14.0%,关系分类准确率提升8.9%,从而证明了MCL-SG的有效性。展开更多
提升无人机的自主着陆能力对于提高无人机的作业效率和野外生存能力具有重要意义。本文提出了一种基于机载视频的无人机降落区域自动检测方法,目的是在缺乏场景先验知识的情况下,提高无人机的自主避障着陆能力。本文将多视图几何约束方...提升无人机的自主着陆能力对于提高无人机的作业效率和野外生存能力具有重要意义。本文提出了一种基于机载视频的无人机降落区域自动检测方法,目的是在缺乏场景先验知识的情况下,提高无人机的自主避障着陆能力。本文将多视图几何约束方法的深度学习网络融入到视觉同步定位与制图(Simultaneous localization and mapping,SLAM)算法中,旨在构建场景的三维地图,同时主动判别潜在障碍物。随后,提出了一种顾及降落区域面积及平坦度等因素的降落区域检测算法,通过体素网格地图的空间分析方式,判别出无人机着陆区域。在不同类别场景中分别进行实验,结果表明了提出方法的准确性。展开更多
文摘For some important object recognition applications such as intelligent robots and unmanned driving, images are collected on a consecutive basis and associated among themselves, besides, the scenes have steady prior features. Yet existing technologies do not take full advantage of this information. In order to take object recognition further than existing algorithms in the above application, an object recognition method that fuses temporal sequence with scene priori information is proposed. This method first employs YOLOv3 as the basic algorithm to recognize objects in single-frame images, then the DeepSort algorithm to establish association among potential objects recognized in images of different moments, and finally the confidence fusion method and temporal boundary processing method designed herein to fuse, at the decision level, temporal sequence information with scene priori information. Experiments using public datasets and self-built industrial scene datasets show that due to the expansion of information sources, the quality of single-frame images has less impact on the recognition results, whereby the object recognition is greatly improved. It is presented herein as a widely applicable framework for the fusion of information under multiple classes. All the object recognition algorithms that output object class, location information and recognition confidence at the same time can be integrated into this information fusion framework to improve performance.
基金Supported by the National Twelfth Five-Year Project(40405050303)
文摘Infrared scene simulation has extensive applications in military and civil fields. Based on a certain experimental environment,object-oriented graphics rendering engine( OGRE) is utilized to simulate a real three-dimensional infrared complex scene. First,the target radiation of each part is calculated based on our experimental data. Then through the analysis of the radiation characteristics of targets and related material,an infrared texture library is established and the 3ds Max software is applied to establish an infrared radiation model.Finally,a real complex infrared scene is created by using the OGRE engine image rendering technology and graphic processing unit( GPU) programmable pipeline technology. The results show that the simulation images are very similar to real images and are good supplements to real data.
基金supported by National High Technology Research and Development Program of China (863 Program) (No.2015AA016306)National Nature Science Foundation of China (No.61231015)National Nature Science Foundation of China (No.61671335)
文摘Object-based audio coding is the main technique of audio scene coding. It can effectively reconstruct each object trajectory, besides provide sufficient flexibility for personalized audio scene reconstruction. So more and more attentions have been paid to the object-based audio coding. However, existing object-based techniques have poor sound quality because of low parameter frequency domain resolution. In order to achieve high quality audio object coding, we propose a new coding framework with introducing the non-negative matrix factorization(NMF) method. We extract object parameters with high resolution to improve sound quality, and apply NMF method to parameter coding to reduce the high bitrate caused by high resolution. And the experimental results have shown that the proposed framework can improve the coding quality by 25%, so it can provide a better solution to encode audio scene in a more flexible and higher quality way.
文摘场景图生成方法(SGG)主要研究图像中的实体及其关系,广泛应用于视觉理解与图像检索等领域。现有的场景图生成方法受限于视觉特征或单一视觉概念,导致关系识别准确率较低,且需要大量的人工标注。为解决上述问题,文中融合图像和文本特征,提出了一种基于多模态对比学习的场景图生成方法MCL-SG(Multimodal Contrastive Learning for Scene Graph)。首先,对图像和文本输入进行特征提取,得到图像和文本特征;然后,使用Transformer Encoder编码器对特征向量进行编码和融合;最后,采用对比学习的自监督策略,计算图像和文本特征的相似度,通过最小化正样本和负样本之间的相似度差异完成训练,无需人工标注。通过大型场景图生成公开数据集VG(Visual Genome)的3个不同层次子任务(即SGDet,SGCls和PredCls)的实验表明:在mean Recall@100指标中,MCL-SG的场景图检测准确率提升9.8%,场景图分类准确率提升14.0%,关系分类准确率提升8.9%,从而证明了MCL-SG的有效性。
文摘提升无人机的自主着陆能力对于提高无人机的作业效率和野外生存能力具有重要意义。本文提出了一种基于机载视频的无人机降落区域自动检测方法,目的是在缺乏场景先验知识的情况下,提高无人机的自主避障着陆能力。本文将多视图几何约束方法的深度学习网络融入到视觉同步定位与制图(Simultaneous localization and mapping,SLAM)算法中,旨在构建场景的三维地图,同时主动判别潜在障碍物。随后,提出了一种顾及降落区域面积及平坦度等因素的降落区域检测算法,通过体素网格地图的空间分析方式,判别出无人机着陆区域。在不同类别场景中分别进行实验,结果表明了提出方法的准确性。