期刊文献+

MBO:基于多目标平衡优化的监控视频浓缩

MBO:Surveillance Video Synopsis Based on Multi-Objective Balance Optimization
下载PDF
导出
摘要 视频浓缩可在极大压缩视频长度的同时完整保留目标运动信息,在学术界和工业界受到广泛关注.然而现有浓缩方法无法精准保留目标之间的交互行为,且难以平衡压缩和碰撞,严重阻碍了视频浓缩的性能提升和实际应用.为此,本文提出了一种基于多目标平衡优化的监控视频浓缩方法(Multi-Objective Balance Optimization,MBO).首先,提出了一种基于目标交互帧数量和动态阈值对比的交互行为判断方法,用以组建多目标单元,结合目标在每帧的移动方向并采用动态阈值提升交互行为判断的准确性;其次,定义了碰撞矩阵和插入位置占比率,分别记录目标碰撞和插入位置深浅;然后,提出了一种压缩与碰撞的动态平衡方法,以优化重排目标,能在极大程度缩短视频长度的同时减少产生的目标碰撞;最后,融合视频背景和重排后的目标生成浓缩视频.VISOR、CAVIAR和KTH等多个数据集上的实验结果表明,相较于当前主流方法,本文所提方法保留交互行为的F-score提升高达0.472,并且能够有效平衡压缩和目标碰撞. Video synopsis,which can greatly compress video length while preserving complete object motion information,has received widespread attention in both academic and industrial circles.However,existing synopsis methods cannot accurately preserve the interactive behaviors between objects and have difficulties in balancing compression and collisions,which seriously hinders the performance improvement and practical application of video synopsis.To address this issue,this paper proposes a surveillance video synopsis method based on multi-objective balance optimization(MBO).Firstly,a method for judging interactive behaviors based on the number of interactive frames and dynamic threshold comparison is proposed to form multi-objective units,combining the movement direction of the object in each frame and using dynamic thresholds to improve the accuracy of interaction behavior judgment.Secondly,the collision matrix and insertion position ratio are defined to record target collisions and the depth of insertion positions,respectively.Then,a dynamic balancing method between compression and collisions is proposed to optimize the rearrangement of objects,can greatly compress video length while reducing object collisions.Finally,the video background and rearranged objects are fused to generate the synopsis video.Experimental results on multiple datasets such as VISOR,CAVIAR,and KTH show that compared with current mainstream methods,our method improves the F-score of preserving interactivity by up to 0.472 and can effectively balance compression and object collisions.
作者 张云佐 朱鹏飞 ZHANG Yun-Zuo;ZHU Peng-Fei(School of Information Science and Technology,Shijiazhuang Tiedao University,Shijiazhuang 050043)
出处 《计算机学报》 EI CAS CSCD 北大核心 2024年第9期2104-2115,共12页 Chinese Journal of Computers
基金 国家自然科学基金(No.61702347,NO.62027801) 河北省自然科学基金(F2022210007,F2017210161) 河北省高等学校科学技术研究项目(ZD2022100) 中央引导地方科技发展资金项目(226Z0501G) 研究生创新项目(YC2023081)资助.
关键词 视频浓缩 多目标 平衡优化 交互行为 video synopsis multi-objective balance optimization interactive behavior
  • 相关文献

参考文献5

二级参考文献32

  • 1Turaga P, Chellappa R, Subrahmanian V S, Udrea O. Machine recognition of human activities: A survey. IEEE Transactions on Circuits and Systems for Video Technology, 2008, 18(11): 1473-1488.
  • 2Niebles J C, Wang H, Li Fei-Fei. Unsupervised learning of human action categories using spatial-temporal words. International Journal of Computer Vision, 2008, 79(3): 299-318.
  • 3Oliver N M, Rosario B, Pentland A P. A Bayesian computer vision system for modeling human interactions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2000, 22(8) : 831-843.
  • 4Xiang T, Gong S. Beyond tracking: Modeling activity and understanding behavior. International Journal of Computer Vision, 2006, 67(1): 21-51.
  • 5Ivanov Y A, Bobick A F. Recognition of visual activities and interactions by stochastic parsing. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2000, 22 (8): 852-872.
  • 6Park S, Aggarwal J K. A hierarchical Bayesian network for event recognition of human action and interaction. ACM Journal of Multimedia Systems, Special Issue on Video Surveillance, 2004, 10(2): 164-179.
  • 7Ryoo M S, Aggarwal J K. Recognition of composite human activities through context-free grammar based representation//Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. NY, USA, 2006, 1709-1719.
  • 8Du Y, Chen F, Xu W, Zhang W. Activity recognition through multi-scale motion detail analysis. Neurocomputing, 2008, 71(16-18): 3561-3574.
  • 9Hongeng S, Nevatia R, Bremond F. Video-hased event recognition: Activity representation and probabilistie reeognition methods. Computer Vision and Image Understanding, 2004, 96(2) : 129-162.
  • 10Hakeem A, Shah M. Learning, detection and representation of multi-agent event in videos. Artificial Intelligence, 2007, 171(8-9): 586-605.

共引文献44

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部