摘要
[目的/意义]面向当前国内专利审查和专利情报分析工作中对于海量专利分类的客观需求,设计了7种基于深度学习的专利自动分类方法,对比各种方法的分类效果,从而助力专利分类效率和效果的提升。[方法/过程]针对传统机器学习方法存在的缺陷,基于Word2Vec、CNN、RNN、Attention机制等深度学习技术,考虑专利文本语序特征、上下文特征以及分类关键特征,设计Word2Vec+TextCNN、Word2Vec+GRU、Word2Vec+BiGRU、Word2Vec+BiGRU+TextCNN等7种深度学习模型,以中国专利为例,选取IPC主分类号的"部"作为分类依据,对比这7种模型与3种传统分类模型在中文专利分类任务中的效果。[结果/结论]实证研究效果显示,采用考虑语序特征、上下文特征及强化关键特征的深度学习方法进行中文专利分类具有更优的分类效果。
[Purpose/significance]In order to meet the needs of classifying massive patent automatically in current patent examination and patent information analysis work,this paper studies a series of patent automatic classification methods based on deep learning and compares the classification effects.This will promote the efficiency and effectiveness of patent classification.[Method/process]Aiming at the shortcoming of traditional machine learning methods,7 deep learning models was designed,including Word2Vec+TextCNN,Word2Vec+GRU,Word2Vec+BiGRU,Word2Vec+BiGRU+TextCNN and so on.These models based on the deep learning technology,such as Word2Vec,CNN,RNN,Attention mechanism and so on and considered the characteristics of patent text word order,context features and other key features in classification.Selecting the‘Section’of main International Patent Classification(IPC)was as the class labels,the study classified the Chinese patents by above 7 deep learning models and 3 traditional machine learning methods.And there was a comparison about the effect of classification in different models.[Result/conclusion]The empirical research indicated that it reached the better effect of Chinese patent classification by using deep learning methods which considered the characteristics of patent text word order,context features and other key features in classification.
作者
吕璐成
韩涛
周健
赵亚娟
Lyu Lucheng;Han Tao;Zhou Jian;Zhao Yajuan(National Science Library,Chinese Academy of Sciences,Beijing 100190;Department of Library,Information and Archives Management,School of Economics and Management,University of Chinese Academy of Sciences,Beijing 100190;Institute of Computing Technology,Chinese Academy of Sciences,Beijing 100190)
出处
《图书情报工作》
CSSCI
北大核心
2020年第10期75-85,共11页
Library and Information Service
基金
中国科学院青年人才项目"基于深度学习的专利所属产业分类"(项目编号:G180161001)研究成果之一。
关键词
专利自动分类
深度学习
词嵌入
专利文本挖掘
patent automatic classification
deep learning
word embedding
patent text mining