期刊文献+

基于Kert的中文主题关键短语提取算法 被引量:5

Chinese topic key phrase extraction algorithm based on Kert
下载PDF
导出
摘要 针对Kert算法在中文主题关键短语提取结果精确率低、短语歧义性强和主题刻画能力弱等问题,提出一种基于Kert改进后的中文主题关键短语提取算法。该方法首先引入L统计量重构原分词算法,使得原分词算法具备一定的新词识别能力以减少分词后的词语歧义性,然后使用顺序合并代替Kert中的频繁模式增长(FP-Growth)解决候选关键短语集合中短语语序颠倒的问题,最后在此基础上加入改进后的约束排序算法后提出更加有效的中文主题关键短语提取算法。通过与多组近年比较经典的主题关键短语提取算法的对比实验表明:改进算法在提取的精确率、召回率、F值指标的量化结果中有5到20不等的百分点提升。该方法在理论上含有更为严谨的中文短语提取思路,并在实际的文本集主题关键短语提取工作中拥有更为良好的应用价值。 Aiming at the problems such as low precision rate of Kert algorithm, strong ambiguity of phrase and weak ability of theme description in Chinese topic key phrase extraction, an improved Chinese topic key phrase extraction algorithm based on Kert was proposed. L statistics was introduced to reconstruct the original word segmentation algorithm, which enables the original word segmentation algorithm to have a certain ability to recognize new words in order to reduce the ambiguity of words after word segmentation, and then sequential merging was used to replaceFrequent Pattern Growth (FP-Growth), which solved the problem of inversion of phrase order in the set of candidate key phrases. Finally, a more effective Chinese topic key phrase extraction algorithm was proposed by adding an improved constraint sorting algorithm. compared with the classical topic key phrase extraction algorithms in recent years, the experimental results show that the improved algorithm has an improvement of 5 to 20 percentage points in the quantization results of extraction accuracy, recall and F value. In theory, this method contains a more rigorous idea of Chinese phrase extraction, and has a better application value in the practical key phrase extraction from text set.
作者 刘晨晖 张德生 胡钢 LIU Chenhui;ZHANG Desheng;HU Gang(Faculty of Science, Xi’an University of Technology, Xi’an Shaanxi 710054, China)
出处 《计算机应用》 CSCD 北大核心 2019年第A01期245-249,共5页 journal of Computer Applications
基金 国家自然科学基金资助项目(51875454) 陕西省自然科学基础研究规划项目(2017JM5048)
关键词 数据挖掘 文本挖掘 中文分词 短语排序 主题关键短语 data mining text mining Chinese segmentation phrase ranking topic key phrase
  • 相关文献

参考文献9

二级参考文献154

共引文献377

同被引文献37

引证文献5

二级引证文献21

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部