摘要
现有的专利新颖性测量方法需要依赖特定的领域知识以及专家的介入,性能差且耗时长,为此,提出了一种不依赖特定领域知识及专家的全自动化系统的识别新颖性专利的方法。首先利用鲁棒优化的BERT方法(robustly optimized BERT approach,RoBERTa)表示专利向量,以解决需要依赖技术领域的知识来表示专利的多义词问题;其次,利用数据点的密度分布并结合信息熵改进局部离群因子(local outlier factor,LOF)算法来确定离群点个数及数据点集,提高离群点的检测精度,结合RoBERT与改进的LOF在数值尺度上度量专利的新颖性。实验验证表明,所提方法测量的专利新颖性的得分与现有文献中的相关专利指标显著相关,并且识别出的新颖性专利具有更高的技术影响。
Existing patented novelty measurement methods need to rely on specific domain knowledge and expert intervention,poor performance and time-consuming,so a method for identifying novelty patents was proposed in a fully automated system that does not rely on specific domain knowledge and experts.Firstly,RoBERTa was used to represent the patent vector to solve the polysemy problem that needs to rely on knowledge in the technical field to represent the patent.Secondly,the density distribution of data points and the local outlier factor algorithm(LOF)were improved by using the density distribution of data points and combined with information entropy to determine the number of outliers and the set of data points,improve the detection accuracy of outliers,and combine RoBERT and improved LOF to measure the novelty of the patent on a numerical scale.Experimental verification shows that the patent novelty score measured by the proposed method is significantly correlated with the relevant patent indicators in the existing literature,and the identified novelty patents have higher technical impact.
作者
廖列法
姚秀
李奎
LIAO Lie-fa;YAO Xiu;LI Kui(School of Software Engineering,Jiangxi University of Science and Technology,NanChang 330000,China;School of Information Engineering,Jiangxi University of Science and Technology,Ganzhou 341000,China)
出处
《科学技术与工程》
北大核心
2023年第17期7420-7427,共8页
Science Technology and Engineering
基金
国家自然科学基金(71462018,71761018)。