摘要
针对单模态特征条件下监控视频的场景识别精度与鲁棒性不高的问题,提出一种基于特征融合的半监督学习场景识别系统。系统模型首先通过卷积神经网络预训练模型分别提取视频帧与音频的场景描述特征;然后针对场景识别的特点进行视频级特征融合;接着通过深度信念网络进行无监督训练,并通过加入相对熵正则化项代价函数进行有监督调优;最后对模型分类效果进行了仿真分析。仿真结果表明,上述模型可有效提升监控场景分类精度,满足针对海量监控视频进行自动化结构化分析等公安业务需求。
To improve the accuracy of scene classification of surveillance video and solve the problem of low robustness under single-modal feature,a semi-supervised learning scene classification system was designed and proposed based on feature fusion.Firstly,the scene description features of video frame and audio were extracted with the pre-training model of convolutional neural network(CNN).Next,video-level feature fusion was performed according to the characteristics of scene recognition.Then,through deep belief network,unsupervised training and supervised optimization were carried out by adding cost function of relative entropy regularization term.Finally,the classification effect of the model was simulated and analyzed.The method proposed in this paper improves the scene recognition ability of surveillance video and can meet the needs of public security business such as automatic structured analysis for mass surveillance video.
作者
申小虎
安居白
SHEN Xiao-hu;AN Ju-bai(College of Information Science and Technology,Dalian Maritime University,Dalian Liaoning 116026,China;Department of Forensic Science and Technology,Jiangsu Police Institute,Nanjing Jiangsu 210031,China)
出处
《计算机仿真》
北大核心
2021年第1期394-399,共6页
Computer Simulation
基金
国家自然科学基金项目(61501082)
江苏省高等学校自然科学研究面上项目(17KJB520006)
江苏高校“青蓝工程”资助项目。
关键词
半监督学习
监控视频
场景识别系统
特征融合
Semi-supervised learning
Surveillance video
Scene recognition system(SRS)
Feature fusion