期刊文献+

快速的领域文档关键词自动提取算法 被引量:12

Fast algorithm of keywords automatic extraction in field
下载PDF
导出
摘要 针对现有关键词提取算法需要大量训练数据及时间、常用词分词困难、互联网文档噪音等问题,提出了一种基于TF-IWF的领域文档关键词快速提取算法。该算法使用简单统计并考虑词长、位置、词性等启发性知识计算词权重,并通过文档净化、领域词典分词等方法提高了关键词提取的速度及准确度。对523篇学生心理健康领域文档的实验结果表明,该算法提取的文档关键词质量优于TF-IDF方法,且能在时间内完成。 Aimed at the problems of existing keywords extraction algorithm needs a lot of training data and time, the difficult to segmentation of common words and the noise to internet documents, a fast algorithm ofkeywords extraction in the field base on TF-IWF is proposed. This algorithm uses simple statistics, considering heuristic knowledge of the word length, position and part of speech to calculate the term weight, and improves the speed and accuracy ofkeywords extraction by methods of documentation purification, domain dictionary segmentation. 523 articles on students' mental health of experiment shows that keywords obtained from this algorithm is better than the quality of based on TF-IDF, and time complexity is O(n).
出处 《计算机工程与设计》 CSCD 北大核心 2011年第6期2142-2145,共4页 Computer Engineering and Design
关键词 关键词提取 中文分词 领域词典 启发式知识 时间复杂度 keywords extraction Chinese word segmentation domain dictionary heuristic knowledge time complexity
  • 相关文献

参考文献16

二级参考文献83

共引文献475

同被引文献166

引证文献12

二级引证文献46

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部