期刊文献+

MPMFC:一种融合网络邻里结构特征和专利语义特征的中药专利分类模型

MPMFC:A Traditional Chinese Medicine Patent Classification Model Integrating Network Neighborhood Structural Features and Patent Semantic Features
原文传递
导出
摘要 【目的】解决因中药自身的复杂性以及现有专利分类模型无法提取到充分的中药专利特征信息而导致的分类准确率不理想问题。【方法】提出中药专利多特征融合分类模型MPMFC:基于专利核心字段的相似度信息构建中药专利相似度网络;利用Node2Vec算法从中药专利相似度网络的全局结构中捕获潜在专利间的邻里结构信息,使其映射为低维向量作为补充特征;使用注意力机制将经过RoBERTa-Tiny预训练的专利语义特征与其对应的补充特征进行特征融合,进而实现中药专利的自动化分类。【结果】在真实的7000条中药专利语料上,MPMFC模型的准确率、召回率和F1值分别达到0.8436、0.8017、0.8221,相较于基线分类模型分别提升1.58、2.59和2.11个百分点。【局限】构建中药专利相似度网络时分配权重具有一定的主观性,非中药科研人员在进行专利标注时会存在部分分类错误。【结论】MPMFC模型在中药专利分类过程中能从多角度获取并学习更丰富的特征表示,从而提高分类准确性。 [Objective]To solve the problem of low accuracy in classification models for Traditional Chinese Medicine(TCM)patents due to the complexity of TCM and insufficient extracted information on the characteristics of TCM patents.[Methods]We proposed a classification model for TCM patents called MPMFC(Medicine Patent Multi-feature Fusion Classifier).Firstly,we constructed a TCM patent similarity network based on the similarity information of the patent core fields.Then,we used the Node2Vec algorithm to capture the neighborhood structure information of potential patents from the global structure of the TCM patent similarity network,which was mapped to low-dimensional vectors as additional features.Finally,the attention mechanism was utilized to fuse the patent semantic feature vector pre-trained by RoBERTa-Tiny with their corresponding supplementary features to classify TCM patents automatically.[Results]We examined the MPMFC model on a corpus of 7,000 TCM patents.It achieved the accuracy,recall,and F1 values of 0.8436,0.8017,and 0.8221,respectively,which were 1.58%,2.59%,and 2.11%higher than the baseline classification model.[Limitations]The weight allocation when constructing the similarity network of TCM patents has subjectivity issues.There may be some classification errors when Non-TCM researchers label patents.[Conclusions]The MPMFC model can acquire and learn more comprehensive feature representations from multiple perspectives during TCM patent classification,improving classification accuracy.
作者 邓娜 何昕洋 陈伟杰 陈旭 Deng Na;He Xinyang;Chen Weijie;Chen Xu(School of Computer Science,Hubei University of Technology,Wuhan 430068,China;School of Information and Safety Engineering,Zhongnan University of Economics and Law,Wuhan 430073,China)
出处 《数据分析与知识发现》 CSCD 北大核心 2023年第4期145-158,共14页 Data Analysis and Knowledge Discovery
基金 国家自然科学基金项目(项目编号:61902116)的研究成果之一。
关键词 中药专利分类 专利相似度网络 特征融合 预训练模型 Node2Vec TCM Patent Classification Patent Similarity Network Feature Fusion Pre-Training Model Node2Vec
  • 相关文献

参考文献15

二级参考文献136

共引文献551

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部