The second generation Audio Video Coding Standard (AVS2) is the most recent video coding standard. By introducing several new coding techniques, AVS2 can provide more efficient compression for scene videos such as sur...The second generation Audio Video Coding Standard (AVS2) is the most recent video coding standard. By introducing several new coding techniques, AVS2 can provide more efficient compression for scene videos such as surveillance videos, conference videos, etc. Due to the limited scenes, scene videos have great redundancy especially in background region. The new scene video coding techniques applied in AVS2 mainly focus on reducing redundancy in order to achieve higher compression. This paper introduces several important AVS2 scene video coding techniques. Experimental results show that with scene video coding tools, AVS2 can save nearly 40%BD?rate (Bj?ntegaard?Delta bit?rate) on scene videos.展开更多
Digit recognition from a natural scene text in video surveillance/broadcasting applications is a challenging research task due to blurred, font variations, twisted, and non-uniform color distribution issues with a dig...Digit recognition from a natural scene text in video surveillance/broadcasting applications is a challenging research task due to blurred, font variations, twisted, and non-uniform color distribution issues with a digit in a natural scene to be recognized. In this paper, to solve the digit number recognition problem, a principal-axis based topology contour descriptor with support vector machine (SVM) classification is proposed. The contributions of this paper include: a) a local descriptor with SVM classification for digit recognition, b) higher accuracy than the state-of-the art methods, and c) low computational power (0.03 second/digit recognition), which make this method adoptable to real-time applications.展开更多
街道场景视频实例分割是无人驾驶技术研究中的关键问题之一,可为车辆在街道场景下的环境感知和路径规划提供决策依据.针对现有方法存在多纵横比锚框应用单一感受野采样导致边缘特征提取不充分以及高层特征金字塔空间细节位置信息匮乏的...街道场景视频实例分割是无人驾驶技术研究中的关键问题之一,可为车辆在街道场景下的环境感知和路径规划提供决策依据.针对现有方法存在多纵横比锚框应用单一感受野采样导致边缘特征提取不充分以及高层特征金字塔空间细节位置信息匮乏的问题,本文提出锚框校准和空间位置信息补偿视频实例分割(Anchor frame calibration and Spatial position information compensation for Video Instance Segmentation,AS-VIS)网络.首先,在预测头3个分支中添加锚框校准模块实现同锚框纵横比匹配的多类型感受野采样,解决目标边缘提取不充分问题.其次,设计多感受野下采样模块将各种感受野采样后的特征融合,解决下采样信息缺失问题.最后,应用多感受野下采样模块将特征金字塔低层目标区域激活特征映射嵌入到高层中实现空间位置信息补偿,解决高层特征空间细节位置信息匮乏问题.在Youtube-VIS标准库中提取街道场景视频数据集,其中包括训练集329个视频和验证集53个视频.实验结果与YolactEdge检测和分割精度指标定量对比表明,锚框校准平均精度分别提升8.63%和5.09%,空间位置信息补偿特征金字塔平均精度分别提升7.76%和4.75%,AS-VIS总体平均精度分别提升9.26%和6.46%.本文方法实现了街道场景视频序列实例级同步检测、跟踪与分割,为无人驾驶车辆环境感知提供有效的理论依据.展开更多
基金supported by the National Basic Research Program of China under grant 2015CB351806the National Natural Science Foundation of China under contract No.61425025,No.61390515 and No.61421062Shenzhen Peacock Plan
文摘The second generation Audio Video Coding Standard (AVS2) is the most recent video coding standard. By introducing several new coding techniques, AVS2 can provide more efficient compression for scene videos such as surveillance videos, conference videos, etc. Due to the limited scenes, scene videos have great redundancy especially in background region. The new scene video coding techniques applied in AVS2 mainly focus on reducing redundancy in order to achieve higher compression. This paper introduces several important AVS2 scene video coding techniques. Experimental results show that with scene video coding tools, AVS2 can save nearly 40%BD?rate (Bj?ntegaard?Delta bit?rate) on scene videos.
基金Acknowledgment This work is supported by the Fundamental Research Funds for the Central Universities of China under grant No.WK2100100006 and the Natural Science Foundation of Anhui Province of China under Grant No. KJ2008A106.
基金supported by“MOST”under Grant No.105-2221-E-119-001
文摘Digit recognition from a natural scene text in video surveillance/broadcasting applications is a challenging research task due to blurred, font variations, twisted, and non-uniform color distribution issues with a digit in a natural scene to be recognized. In this paper, to solve the digit number recognition problem, a principal-axis based topology contour descriptor with support vector machine (SVM) classification is proposed. The contributions of this paper include: a) a local descriptor with SVM classification for digit recognition, b) higher accuracy than the state-of-the art methods, and c) low computational power (0.03 second/digit recognition), which make this method adoptable to real-time applications.
文摘街道场景视频实例分割是无人驾驶技术研究中的关键问题之一,可为车辆在街道场景下的环境感知和路径规划提供决策依据.针对现有方法存在多纵横比锚框应用单一感受野采样导致边缘特征提取不充分以及高层特征金字塔空间细节位置信息匮乏的问题,本文提出锚框校准和空间位置信息补偿视频实例分割(Anchor frame calibration and Spatial position information compensation for Video Instance Segmentation,AS-VIS)网络.首先,在预测头3个分支中添加锚框校准模块实现同锚框纵横比匹配的多类型感受野采样,解决目标边缘提取不充分问题.其次,设计多感受野下采样模块将各种感受野采样后的特征融合,解决下采样信息缺失问题.最后,应用多感受野下采样模块将特征金字塔低层目标区域激活特征映射嵌入到高层中实现空间位置信息补偿,解决高层特征空间细节位置信息匮乏问题.在Youtube-VIS标准库中提取街道场景视频数据集,其中包括训练集329个视频和验证集53个视频.实验结果与YolactEdge检测和分割精度指标定量对比表明,锚框校准平均精度分别提升8.63%和5.09%,空间位置信息补偿特征金字塔平均精度分别提升7.76%和4.75%,AS-VIS总体平均精度分别提升9.26%和6.46%.本文方法实现了街道场景视频序列实例级同步检测、跟踪与分割,为无人驾驶车辆环境感知提供有效的理论依据.