With more multi-modal data available for visual classification tasks,human action recognition has become an increasingly attractive topic.However,one of the main challenges is to effectively extract complementary feat...With more multi-modal data available for visual classification tasks,human action recognition has become an increasingly attractive topic.However,one of the main challenges is to effectively extract complementary features from different modalities for action recognition.In this work,a novel multimodal supervised learning framework based on convolution neural networks(Conv Nets)is proposed to facilitate extracting the compensation features from different modalities for human action recognition.Built on information aggregation mechanism and deep Conv Nets,our recognition framework represents spatial-temporal information from the base modalities by a designed frame difference aggregation spatial-temporal module(FDA-STM),that the networks bridges information from skeleton data through a multimodal supervised compensation block(SCB)to supervise the extraction of compensation features.We evaluate the proposed recognition framework on three human action datasets,including NTU RGB+D 60,NTU RGB+D 120,and PKU-MMD.The results demonstrate that our model with FDA-STM and SCB achieves the state-of-the-art recognition performance on three benchmark datasets.展开更多
In order to solve the linear variable differential transformer (LVDT) displacement sensor nonlinearity of overall range and extend its working range, a novel line-element based adaptively seg- menting method for pie...In order to solve the linear variable differential transformer (LVDT) displacement sensor nonlinearity of overall range and extend its working range, a novel line-element based adaptively seg- menting method for piecewise compensating correction was proposed. According to the mechanical structure of LVDT, the output equation was calculated, and then the theoretic nonlinear source of output was analyzed. By the proposed line-element adaptive segmentation method, the nonlinear output of LVDT was divided into linear and nonlinear regions with a given threshold. Then the com- pensating correction function was designed for nonlinear parts employing polynomial regression tech- nique. The simulation of LVDT validates the feasibility of proposed scheme, and the results of cali- bration and testing experiments fully prove that the proposed method has higher accuracy than the state-of-art correction algorithms.展开更多
The new MPEG-4 video coding standard enables content-based functions. In order to support the new standard, frames should be decomposed into Video Object Planes (VOP), each VOP representing a moving object. This pap...The new MPEG-4 video coding standard enables content-based functions. In order to support the new standard, frames should be decomposed into Video Object Planes (VOP), each VOP representing a moving object. This paper proposes an image segmentation method to separate moving objects from image sequences. The proposed method utilizes the spatial-temporal information. Spatial segmentation is applied to divide each image into connected areas and to find pre~:ise object boundaries of moving objects. To locate moving objects in image sequences, two consecutive image frames in the temporal direction are examined and a hypothesis testing is performed with Neyman-Pearson criterion. Spatial segmentation produces a spatial segmentation mask, and temporal segmentation yields a change detection mask that indicates moving objects and the background. Then spatial-temporal merging can be used to get the final results. This method has been tested on several images. Experimental results show that this segmentation method is efficient.展开更多
街道场景视频实例分割是无人驾驶技术研究中的关键问题之一,可为车辆在街道场景下的环境感知和路径规划提供决策依据.针对现有方法存在多纵横比锚框应用单一感受野采样导致边缘特征提取不充分以及高层特征金字塔空间细节位置信息匮乏的...街道场景视频实例分割是无人驾驶技术研究中的关键问题之一,可为车辆在街道场景下的环境感知和路径规划提供决策依据.针对现有方法存在多纵横比锚框应用单一感受野采样导致边缘特征提取不充分以及高层特征金字塔空间细节位置信息匮乏的问题,本文提出锚框校准和空间位置信息补偿视频实例分割(Anchor frame calibration and Spatial position information compensation for Video Instance Segmentation,AS-VIS)网络.首先,在预测头3个分支中添加锚框校准模块实现同锚框纵横比匹配的多类型感受野采样,解决目标边缘提取不充分问题.其次,设计多感受野下采样模块将各种感受野采样后的特征融合,解决下采样信息缺失问题.最后,应用多感受野下采样模块将特征金字塔低层目标区域激活特征映射嵌入到高层中实现空间位置信息补偿,解决高层特征空间细节位置信息匮乏问题.在Youtube-VIS标准库中提取街道场景视频数据集,其中包括训练集329个视频和验证集53个视频.实验结果与YolactEdge检测和分割精度指标定量对比表明,锚框校准平均精度分别提升8.63%和5.09%,空间位置信息补偿特征金字塔平均精度分别提升7.76%和4.75%,AS-VIS总体平均精度分别提升9.26%和6.46%.本文方法实现了街道场景视频序列实例级同步检测、跟踪与分割,为无人驾驶车辆环境感知提供有效的理论依据.展开更多
基金This work was supported by the Natural Science Foundation of Guangdong Province(Grant Nos.2022A1515140119 and 2023A1515011307)the National Key Laboratory of Air-based Information Perception and Fusion and the Aeronautic Science Foundation of China(Grant No.20220001068001)+1 种基金Dongguan Science and Technology Special Commissioner Project(Grant No.20221800500362)the National Natural Science Foundation of China(Grant Nos.62376261,61972090,and U21A20487).
文摘With more multi-modal data available for visual classification tasks,human action recognition has become an increasingly attractive topic.However,one of the main challenges is to effectively extract complementary features from different modalities for action recognition.In this work,a novel multimodal supervised learning framework based on convolution neural networks(Conv Nets)is proposed to facilitate extracting the compensation features from different modalities for human action recognition.Built on information aggregation mechanism and deep Conv Nets,our recognition framework represents spatial-temporal information from the base modalities by a designed frame difference aggregation spatial-temporal module(FDA-STM),that the networks bridges information from skeleton data through a multimodal supervised compensation block(SCB)to supervise the extraction of compensation features.We evaluate the proposed recognition framework on three human action datasets,including NTU RGB+D 60,NTU RGB+D 120,and PKU-MMD.The results demonstrate that our model with FDA-STM and SCB achieves the state-of-the-art recognition performance on three benchmark datasets.
基金Supported by National High Technology Research and Development Program of China("863" Program)(2011AA041002)
文摘In order to solve the linear variable differential transformer (LVDT) displacement sensor nonlinearity of overall range and extend its working range, a novel line-element based adaptively seg- menting method for piecewise compensating correction was proposed. According to the mechanical structure of LVDT, the output equation was calculated, and then the theoretic nonlinear source of output was analyzed. By the proposed line-element adaptive segmentation method, the nonlinear output of LVDT was divided into linear and nonlinear regions with a given threshold. Then the com- pensating correction function was designed for nonlinear parts employing polynomial regression tech- nique. The simulation of LVDT validates the feasibility of proposed scheme, and the results of cali- bration and testing experiments fully prove that the proposed method has higher accuracy than the state-of-art correction algorithms.
文摘The new MPEG-4 video coding standard enables content-based functions. In order to support the new standard, frames should be decomposed into Video Object Planes (VOP), each VOP representing a moving object. This paper proposes an image segmentation method to separate moving objects from image sequences. The proposed method utilizes the spatial-temporal information. Spatial segmentation is applied to divide each image into connected areas and to find pre~:ise object boundaries of moving objects. To locate moving objects in image sequences, two consecutive image frames in the temporal direction are examined and a hypothesis testing is performed with Neyman-Pearson criterion. Spatial segmentation produces a spatial segmentation mask, and temporal segmentation yields a change detection mask that indicates moving objects and the background. Then spatial-temporal merging can be used to get the final results. This method has been tested on several images. Experimental results show that this segmentation method is efficient.
文摘街道场景视频实例分割是无人驾驶技术研究中的关键问题之一,可为车辆在街道场景下的环境感知和路径规划提供决策依据.针对现有方法存在多纵横比锚框应用单一感受野采样导致边缘特征提取不充分以及高层特征金字塔空间细节位置信息匮乏的问题,本文提出锚框校准和空间位置信息补偿视频实例分割(Anchor frame calibration and Spatial position information compensation for Video Instance Segmentation,AS-VIS)网络.首先,在预测头3个分支中添加锚框校准模块实现同锚框纵横比匹配的多类型感受野采样,解决目标边缘提取不充分问题.其次,设计多感受野下采样模块将各种感受野采样后的特征融合,解决下采样信息缺失问题.最后,应用多感受野下采样模块将特征金字塔低层目标区域激活特征映射嵌入到高层中实现空间位置信息补偿,解决高层特征空间细节位置信息匮乏问题.在Youtube-VIS标准库中提取街道场景视频数据集,其中包括训练集329个视频和验证集53个视频.实验结果与YolactEdge检测和分割精度指标定量对比表明,锚框校准平均精度分别提升8.63%和5.09%,空间位置信息补偿特征金字塔平均精度分别提升7.76%和4.75%,AS-VIS总体平均精度分别提升9.26%和6.46%.本文方法实现了街道场景视频序列实例级同步检测、跟踪与分割,为无人驾驶车辆环境感知提供有效的理论依据.