Although predecessors have made great contributions to the semantic segmentation of 3D indoor scenes,there still exist some challenges in the debris recognition of terrain data.Compared with hundreds of thousands of i...Although predecessors have made great contributions to the semantic segmentation of 3D indoor scenes,there still exist some challenges in the debris recognition of terrain data.Compared with hundreds of thousands of indoor point clouds,the amount of terrain point cloud is up to millions.Apart from that,terrain point cloud data obtained from remote sensing is measured in meters,but the indoor scene is measured in centimeters.In this case,the terrain debris obtained from remote sensing mapping only have dozens of points,which means that sufficient training information cannot be obtained only through the convolution of points.In this paper,we build multi-attribute descriptors containing geometric information and color information to better describe the information in low-precision terrain debris.Therefore,our process is aimed at the multi-attribute descriptors of each point rather than the point.On this basis,an unsupervised classification algorithm is proposed to divide the point cloud into several terrain areas,and regard each area as a graph vertex named super point to form the graph structure,thus effectively reducing the number of the terrain point cloud from millions to hundreds.Then we proposed a graph convolution network by employing PointNet for graph embedding and recurrent gated graph convolutional network for classification.Our experiments show that the terrain point cloud can reduce the amount of data from millions to hundreds through the super point graph based on multi-attribute descriptor and our accuracy reached 91.74%and the IoU reached 94.08%,both of which were significantly better than the current methods such as SEGCloud(Acc:88.63%,IoU:89.29%)and PointCNN(Acc:86.35,IoU:87.26).展开更多
片段视频语义识别旨在识别视频中短小片段的语义概念,是视频分析的一项重要任务.由于片段视频的数量巨大且缺乏可参考的网络标签,片段视频的标记十分困难,通常只能对部分片段视频进行标记.如何利用有限的语义标签提高片段视频语义识别...片段视频语义识别旨在识别视频中短小片段的语义概念,是视频分析的一项重要任务.由于片段视频的数量巨大且缺乏可参考的网络标签,片段视频的标记十分困难,通常只能对部分片段视频进行标记.如何利用有限的语义标签提高片段视频语义识别的准确率是一项关键挑战.因此本文提出了一种基于长短时预测一致性的视频语义识别算法.该算法通过引入完整视频语义与片段视频语义一致性的约束,对片段视频语义识别结果进行筛选,以此提高片段视频语义识别的准确率.本文提出的算法在大规模视频数据集YouTube-8M的片段视频语义识别任务上达到了82.62%的平均均值准确率(mean average precision, MAP)识别精度,在第三届YouTube-8M比赛中排名第二.展开更多
目的视频中的人体行为识别技术对智能安防、人机协作和助老助残等领域的智能化起着积极的促进作用,具有广泛的应用前景。但是,现有的识别方法在人体行为时空特征的有效利用方面仍存在问题,识别准确率仍有待提高。为此,本文提出一种在空...目的视频中的人体行为识别技术对智能安防、人机协作和助老助残等领域的智能化起着积极的促进作用,具有广泛的应用前景。但是,现有的识别方法在人体行为时空特征的有效利用方面仍存在问题,识别准确率仍有待提高。为此,本文提出一种在空间域使用深度学习网络提取人体行为关键语义信息并在时间域串联分析从而准确识别视频中人体行为的方法。方法根据视频图像内容,剔除人体行为重复及冗余信息,提取最能表达人体行为变化的关键帧。设计并构造深度学习网络,对图像语义信息进行分析,提取表达重要语义信息的图像关键语义区域,有效描述人体行为的空间信息。使用孪生神经网络计算视频帧间关键语义区域的相关性,将语义信息相似的区域串联为关键语义区域链,将关键语义区域链的深度学习特征计算并融合为表达视频中人体行为的特征,训练分类器实现人体行为识别。结果使用具有挑战性的人体行为识别数据集UCF(University of Central Florida)50对本文方法进行验证,得到的人体行为识别准确率为94.3%,与现有方法相比有显著提高。有效性验证实验表明,本文提出的视频中关键语义区域计算和帧间关键语义区域相关性计算方法能够有效提高人体行为识别的准确率。结论实验结果表明,本文提出的人体行为识别方法能够有效利用视频中人体行为的时空信息,显著提高人体行为识别准确率。展开更多
基金This research was funded by grant from the Key Research and Development Program of Shaanxi Province(2018NY-127,2019ZDLNY07-02-01,2020NY-205)National Undergraduate Training Program for Innovation and entrepreneurship plan(S201910712240,X201910712080).
文摘Although predecessors have made great contributions to the semantic segmentation of 3D indoor scenes,there still exist some challenges in the debris recognition of terrain data.Compared with hundreds of thousands of indoor point clouds,the amount of terrain point cloud is up to millions.Apart from that,terrain point cloud data obtained from remote sensing is measured in meters,but the indoor scene is measured in centimeters.In this case,the terrain debris obtained from remote sensing mapping only have dozens of points,which means that sufficient training information cannot be obtained only through the convolution of points.In this paper,we build multi-attribute descriptors containing geometric information and color information to better describe the information in low-precision terrain debris.Therefore,our process is aimed at the multi-attribute descriptors of each point rather than the point.On this basis,an unsupervised classification algorithm is proposed to divide the point cloud into several terrain areas,and regard each area as a graph vertex named super point to form the graph structure,thus effectively reducing the number of the terrain point cloud from millions to hundreds.Then we proposed a graph convolution network by employing PointNet for graph embedding and recurrent gated graph convolutional network for classification.Our experiments show that the terrain point cloud can reduce the amount of data from millions to hundreds through the super point graph based on multi-attribute descriptor and our accuracy reached 91.74%and the IoU reached 94.08%,both of which were significantly better than the current methods such as SEGCloud(Acc:88.63%,IoU:89.29%)and PointCNN(Acc:86.35,IoU:87.26).
文摘片段视频语义识别旨在识别视频中短小片段的语义概念,是视频分析的一项重要任务.由于片段视频的数量巨大且缺乏可参考的网络标签,片段视频的标记十分困难,通常只能对部分片段视频进行标记.如何利用有限的语义标签提高片段视频语义识别的准确率是一项关键挑战.因此本文提出了一种基于长短时预测一致性的视频语义识别算法.该算法通过引入完整视频语义与片段视频语义一致性的约束,对片段视频语义识别结果进行筛选,以此提高片段视频语义识别的准确率.本文提出的算法在大规模视频数据集YouTube-8M的片段视频语义识别任务上达到了82.62%的平均均值准确率(mean average precision, MAP)识别精度,在第三届YouTube-8M比赛中排名第二.
文摘目的视频中的人体行为识别技术对智能安防、人机协作和助老助残等领域的智能化起着积极的促进作用,具有广泛的应用前景。但是,现有的识别方法在人体行为时空特征的有效利用方面仍存在问题,识别准确率仍有待提高。为此,本文提出一种在空间域使用深度学习网络提取人体行为关键语义信息并在时间域串联分析从而准确识别视频中人体行为的方法。方法根据视频图像内容,剔除人体行为重复及冗余信息,提取最能表达人体行为变化的关键帧。设计并构造深度学习网络,对图像语义信息进行分析,提取表达重要语义信息的图像关键语义区域,有效描述人体行为的空间信息。使用孪生神经网络计算视频帧间关键语义区域的相关性,将语义信息相似的区域串联为关键语义区域链,将关键语义区域链的深度学习特征计算并融合为表达视频中人体行为的特征,训练分类器实现人体行为识别。结果使用具有挑战性的人体行为识别数据集UCF(University of Central Florida)50对本文方法进行验证,得到的人体行为识别准确率为94.3%,与现有方法相比有显著提高。有效性验证实验表明,本文提出的视频中关键语义区域计算和帧间关键语义区域相关性计算方法能够有效提高人体行为识别的准确率。结论实验结果表明,本文提出的人体行为识别方法能够有效利用视频中人体行为的时空信息,显著提高人体行为识别准确率。