摘要
针对互联网多媒体视频数量的爆炸式增长导致快速获取视频的内容变得非常困难问题,提出了一种基于多特征的关键词提取算法TFL-WS算法.通过分析视频包含丰富的相关文本信息的特点,建立了基于改进TF和多特征的候选词权重计算公式,该公式将候选词的统计特征与位置权重动态结合起来,并考虑候选词的词性、词跨度等属性,借助扩展的同义词词林来提取关键词,通过提取到的关键词来表述视频的内容信息.实验结果表明:改进后的算法所提取的关键词效果更好,在准确率和召回率方面都有一定的提升,并且能够很好的表示视频的内容.
The explosive growth of multimedia video on the Internet leads to access the content of the video more and more difficulty, a keyword extraction algorithm TFL-WS based on multiple features is proposed in this paper. Through analyzing the characteristics of the video which contains abundant related text information, a word weight calculation formula which is based on improved TF and multiple features is established. The statistical characteristic of candidate words and location weight arecombined dynamically in this formula. Considering the part of speech, word span of candidate words, expanded synonym dictionary is used to extract keywords. So the content of the video information can be expressed by the key words. The experimental result shows that the improved algorithm of extracting the keywords has a better result. It has some improvement in the precision and recall rates, and it can represent the video content much better.
作者
王万良
潘蒙
WANG Wanliang PAN Meng(College of Computer Science and Technology, Zhejiang University of Technology, Hangzhou 310023, China)
出处
《浙江工业大学学报》
CAS
北大核心
2017年第1期14-18,共5页
Journal of Zhejiang University of Technology
基金
国家"十二五"科技支撑计划项目(2012BAD10B01)
浙江省重大科技专项项目(2013C01113)
关键词
提取
视频内容
TF
特征词权重
keyword extraction
video content
TF
term weight