摘要
[目的/意义]旨在对大量的中文专利实现快速分类,满足专利审查以及情报分析等工作的要求.[方法/过程]结合专利文本的固有格式以及存在多个IPC分类号的实际情况,将多示例多标签学习应用于专利自动分类中,在介绍几种经典的多示例多标签模型的基本原理之后,将这些模型运用于中文专利IPC分类号的确定.[结果/结论]实验证明,多示例多标签模型适合运用在专利的自动分类中,并且从Average precision、Hamming Loss、Ranking Loss、One Error、Coverage、Training time等指标分析可以发现,MIMLRBF模型能快速、准确地运用在中文专利IPC分类号的确定中,为大规模专利的自动分类提供借鉴.
[Purpose/significance]In order to achieve rapid classification in a large number of Chinese patentsto meet the requirements of patent examination and intelligence analysis.[Method/process]Combined with the in-herent format of patent text and the fact that there are multiple classification numbers,this paper applied multi-in-stance multi-label learning to automatic patent classification.Firstly,several classical multi-instance multi-labellearning methods were introduced,and then these methods were applied to determine IPC number of Chinese patent.[Result/conclusion]lt is experimentally demonstrated that the multi-instance multi-label learning methods are suit-able for patent automatic classification,accortling to average precision,hamming loss,ranking loss,one error,cov-erage,training time,it is found that MIMLRBF can be used to determine the IPC number of Chinese patents quicklyand accurately,which provides a new perspective for classifying large-scale patents.
作者
包翔
刘桂锋
崔靖华
Bao Xiang;Liu Guifeng;Cui Jinghua(Institute of Science and Technology Information,Jiangsu University,Zhenjiang 212013;School of Information Management,Nanjing University,Nanjing 210093)
出处
《图书情报工作》
CSSCI
北大核心
2021年第8期107-113,共7页
Library and Information Service
基金
江苏省高校哲学社会科学研究一般项目“主题模型在高校图书馆知识产权信息服务中的研究与实践”(项目编号:2019SJA1870)
江苏省高校自然科学研究面上项目“基于多示例多标签学习及深度神经网络的专利主题分类研究”(项目编号:19KJB52005)研究成果之一。