期刊文献+

中文关键短语自动提取方法研究 被引量:3

Research on Technologies of Chinese Key-Phrase Automatic Extraction
下载PDF
导出
摘要 SegPhrase算法是当前提取关键短语最新的技术,其提取关键短语的结果比传统方法具有更高的准确率和召回率。但是SegPhrase算法在关键短语的提取和质量评估方面还存在一些缺陷。为了提高关键短语提取的质量,实现对中文关键短语的有效提取,对SegPhrase算法进行了改进。在短语产生阶段,通过利用词串之间的互信息特征保留部分低频但关键的短语;在短语质量评估阶段,通过赋予不同特征不同的权重来对短语进行综合评估,选择更符合实际应用语境的短语。最后,为了验证提取的关键短语的质量,将提取的关键短语应用于文档主题分析。通过实验证明,改进的SegPhrase算法比原方法具有更高的召回率和准确率,该方法提取的关键短语的主题分析比基于关键词的主题分析更能够清晰准确地表达文档主题信息。 The SegPhrase algorithm is the state-of-art algorithm for key phrases extraction. It can get higher precision and recall in key phrases extraction than existing methods. However, SegPhrase algorithm has some shortcomings in key phrases extraction and their quality evaluation. In order to improve the quality of key phrases extraction and achieve effective Chinese key-phrase extractions, the SegPhrase algorithm is improved in this paper. In the phase of phrase generation, the mutual information feature between words is applied to preserve some low-frequency but important phrases. In the phase of phrase quality evaluation, different weights are assigned to different phrases to make the comprehensive assessment of the phrase. Then, the phrases that are more suitable to the context are selected. Finally, in order to verify the quality of the extracted key phrases, the extracted key phrases are applied to the topic analysis. Experiments show that the improved SegPhrase algorithm has higher recall and precision than the original method. The topic analysis results using the key phrases are more accurate than those based on keys, and can express the topic information of the document clearly.
作者 荣垂田 李银银 王琰 RONG Chuitian;LI Yinyin;WANG Yan(School of Computer Science and Technology,Tianjin Polytechnic University,Tianjin 300387,China;School of Computer and Information Engineering,Xiamen University of Technology,Xiamen,Fujian 361024,China)
出处 《计算机科学与探索》 CSCD 北大核心 2019年第9期1481-1492,共12页 Journal of Frontiers of Computer Science and Technology
基金 国家自然科学基金 福建省中青年教师教育科研项目~~
关键词 关键短语提取 文本特征 互信息 主题分析 key phrase extraction text feature mutual information topic analysis
  • 相关文献

参考文献11

二级参考文献198

共引文献497

同被引文献45

引证文献3

二级引证文献15

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部