摘要
【目的】现有专利相似度计算方法对专利文本独有特征利用不足,并一定程度上忽视了专利内容与结构的特性,本文就上述问题提出一种新的专利相似度计算方法。【方法】通过权利要求层级特征生成技术组合句并进行信息核心度、信息丰富度的加权,兼顾技术内容范围与技术信息重点进行专利表示,在此基础上进行专利相似度计算。通过相关性指标与专利分类的对比实验证明方法的合理性。【结果】本文提出的方法较同类基准方法可以更充分地表达专利信息,更适用于专利相似度计算;技术组合句的重构对模型表现提升作用明显,在该基础上的信息核心度、信息丰富度的加权能进一步提高模型表现。【局限】仅在量子计算领域进行实验,技术领域是否会对方法表现造成影响仍待探究。【结论】权利要求树与技术组合句的信息组织形式能够提高专利文本的利用效率;基于专利权利要求层级特征的技术组合句与对应信息特征加权能够提升专利表示效果及其在相似度任务中的表现。
[Objective]This paper proposes a new model to compute patent similarity,which fully leverages the characteristics of patent texts and their structural and context features.[Methods]First,we used technical compound sentences,the weighting of information core degree,and information richness to represent patents.Then,we calculated patent-to-patent similarity with the representation.Finally,we conducted comparative experiments with correlation scores and patent classification.[Results]The proposed method outperformed benchmark methods in computing patent similarities.The technical compound sentences and weighting of information core degree and richness further improved the model’s performance.[Limitations]We only examined the model with quantum computing.[Conclusions]Using a claim tree and technical compound sentences to organize patent information can improve the efficiency of patent text processing.The weighting of information core degree and richness based on hierarchical features of patents can improve their representation and patent similarity computing tasks.
作者
向姝璇
操玉杰
毛进
Xiang Shuxuan;Cao Yujie;Mao Jin(Laboratory of Data Intelligence and Interdisciplinary Innovation,Nanjing University,Nanjing 210023,China;School of Information Management,Central China Normal University,Wuhan 430074,China;School of Information Management,Wuhan University,Wuhan 430072,China;Center for Studies of Information Resources,Wuhan University,Wuhan 430072,China)
出处
《数据分析与知识发现》
EI
CSSCI
CSCD
北大核心
2024年第2期33-43,共11页
Data Analysis and Knowledge Discovery
基金
国家自然科学基金创新研究群体项目(项目编号:71921002)
湖湘高层次人才聚集计划项目(项目编号:2021RC5029)的研究成果之一。