期刊文献+

中文专利文档关键词自动提取方法研究进展 被引量:1

Progress of Keyword Extraction of Chinese Patent Documents
下载PDF
导出
摘要 专利是一种包含学术、商业、法律等信息的科技文献,记录了大量新颖、实用的研究成果,近年来关注度不断提高。利用共词分析、文本聚类等方法对专利文献进行信息分析时,往往需要借助关键词提取技术达到降低数据复杂度、过滤噪声的目的。关键词提取技术多数基于统计规律。本文对基于词频、关联信息和多特征的关键词提取方法研究进展进行了总结,介绍了常用的分别以TF-IDF、熵、词汇链、Text Rank、遗传算法、决策树学习、朴素贝叶斯分类器、支持向量机等为主导的方法。另外,本文还总结了在专利文档关键词提取中可能用到的词频、位置、语义、关联、自身等方面的特征。实际应用中,关键词自动提取技术可作为一种有力的辅助手段,降低数据处理过程中的人力和时间成本。 Patent is a technological literature with academic, commercial, legal characteristics, carrying lots of the most timely and useful fruits of research. Keyword extraction technology is widely employed in patent information analysis, such as co-word analysis and text clustering method, to lower the data complexity and noise. In this paper, we made a brief summary of the three types of keyword extraction technique based on term frequency, information association and multi-feature, respectively. Furthermore, a dozen keyword extraction methods was introduced detailedly, including those based on TF-IDF, entropy, lexical chains, TextRank, genetic algorithms, decision tree learning, Naive Bayes classifier and support vector machines. In addition, we also summarized the term's feature that may he used in keyword extraction of patent document from different perspectives, such as frequency, position, semantic, association and their own characteristics. This technology is a powerful auxiliary tool in practical application to reduce the cost of human resources.
出处 《世界科学技术-中医药现代化》 2015年第1期29-34,共6页 Modernization of Traditional Chinese Medicine and Materia Medica-World Science and Technology
基金 北京科委"首都市民健康项目培育"项目(Z131100006813045):生物药物专利信息服务系统研发 负责人:孙瑞阳
关键词 中文专利文档 关键词提取 TF-IDF 关联信息 机器学习 Chinese Patent Documentation, Keyword extraction, TF-IDF, Associated information, Machine Learning
  • 相关文献

参考文献24

二级参考文献136

共引文献253

同被引文献9

引证文献1

二级引证文献5

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部