摘要
为解决专利文档的自动化分类,根据机械领域专利文本的特点,提出了一种基于卷积神经网络与随机森林的机械专利文本分类模型;该模型应用卷积神经网络作为有监督的文本特征提取器,结合随机森林作为分类器,面向机械领域专利文本进行专利文本分类。该模型被应用在包含96类的107 302份英文机械专利文档的数据集上。实验结果表明,该模型相比k近邻、Na6ve Bayes、随机森林等经典机器学习算法在准确率、召回率以及查全率方面均有显著提高。
An english mechanical patent classification model was proposed based on convolutional neural networks and random forest to address automatically patent classification problem. The convolutional neural networks work as the supervised feature extractor and the random forest algorithm serves as the classifier. A series of experiments have conducted in a dataset which consists of 107 302 english mechanical patent documents distributed in 96 categories at subclass level. The experiment results show that model achieved a significant improvement when comparing to classical machine learning methods such as,k-nearest neighbor,Na6 ve Bayes,and random forest,in precision,recall,and F1 aspects respectively.
出处
《科学技术与工程》
北大核心
2018年第6期268-272,共5页
Science Technology and Engineering
基金
国家自然科学基金(61640209,51475097)和贵州省科技计划(黔科合LH字[2016]7433号,黔科合J字[2014]2謝号,黔科合人才[2015]4011号,黔科合JZ字[2014]2004号,黔科合人字(2015)13号)资助
关键词
机械专利分类
深度卷积神经网络
随机森林
文本特征提取
mechanical patent classification
deep convolutional neural networks
random forest
text feature extractor