期刊文献+

基于机器学习的专利文本分类算法研究综述 被引量:18

A Review of Research on Patent Document Classification Algorithms Based On Machine Learning
下载PDF
导出
摘要 总结国内外专利文本分类情况,简要叙述基于机器学习的专利文本分类的一般框架,介绍专利文本分类的文本预处理、特征提取、文本表示、分类器构建及效果评价等过程。将应用于专利文本分类的机器学习算法分为单一分类算法和组合分类算法着重探讨:单一分类算法主要有NB算法、ANN算法、Rocchio算法、KNN算法、SVM算法等;组合分类算法主要有两种组合算法,如NB-KNN算法、Rocchio-KNN算法、KNN-SVM算法、SVM-其它算法,还有多种组合算法。指出各种机器学习算法应用在专利文本分类上的优势与不足,从专利文本预处理、特征提取、专利文本表示、分类器的构建、新方法的探索等五个方面对专利文本自动分类技术进行展望。 This article firstly summarized the patent document classification at home and abroad, and then based on machine learning, briefly described the general framework of patent document classification, followed by an introduction to text preprocessing, feature ex-traction, text representation, classifier building and the evaluation process of patent document classification. Also this article paid more attention to discussing the machine learning algo-rithms in patent document classification which could be divided into single algorithm and combined algorithms. Single algorithm mainly consisted of NB algorithm, ANN algorithm, Rocchioalgorithm, KNN algorithm, and SVM algorithm; combined algorithms could be classi-fied into bi-algorithm ( e. g. NB-KNN algorithm, Rocchio-KNN algorithm, KNN-SVM algo-rithm, and SVM-other algorithm) and multi-algorithm. In addition, the advantages and disad-vantages of the application of various kinds of machine learning algorithms to patent document classification were pointed out, and future development of automatic patent document classification techniques were also presented from 5 aspects (text preprocessing, feature ex-traction, text representation, classifier building and the exploration of new methods of patent document classification).
出处 《图书情报研究》 2016年第3期79-86,共8页 Library and Information Studies
关键词 专利文本 自动分类 机器学习 朴素贝叶斯 支持向量机 patent document automatic classification machine learning Naive Bayes Support Vector Machine
  • 相关文献

参考文献12

二级参考文献119

共引文献87

同被引文献131

引证文献18

二级引证文献93

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部