摘要
针对专利文本采用层级细分类低层级文本间相似度高、文本特征难以区分的特性,提出了一种LSTM-A文本分类模型。该模型使用LSTM网络对输入序列进行编码,并引入注意力机制对不同作用的文本特征分配不同权重,最后使用incopat专利数据库的专利文本数据集进行方法的有效性验证。实验表明,该模型可以有效提升高相似度专利文本的分类准确率。
Aiming at the feature that patent documents use hierarchical fine classification and low-level text with high similarity and difficult to distinguish text features,an LSTM-A text classification model is proposed.The model uses the LSTM network to encode the input sequence,and introduces attention mechanisms to assign different weights to the text features with different effects.Finally,the patent text dataset of the incopat patent database is used to verify the validity of the method.Experiments indicate that this model can effectively improve the classification accuracy of patent documents with high similarity.
作者
薛金成
姜迪
吴建德
XUE Jin-cheng;JIANG Di;WU Jian-de(School of Information Engineering and Automation,Kunming University of Science and Technology,Kunming Yunnan 650500,China;Institute of Intellectual Property Development,Kunming University of Science and Technology,Kunming Yunnan 650500,China;Computing Center,Kunming University of Science and Technology,Kunming Yunnan 650500,China)
出处
《通信技术》
2019年第12期2888-2892,共5页
Communications Technology