摘要
为了准确稳定地对专利主题进行标引,本文提出了一种基于概率潜在语义分析的专利主题标引方法。首先建立由共同主题和特定主题所联合起来的混合模型;然后通过这两类主题相关性推断出训练集和测试集特定主题的映射关系;最后选择相似度最高的主题作为专利的主题完成标引。实验结果表明,该方法能较为准确、稳定地对未标记专利进行主题标引。本文尝试将概率潜在语义分析运用到专利文本的标引中,既是对专利标引自动化的一种积极尝试,也为深层次挖掘专利信息情报技术提供了一种新的思路。
In order to index the patent topics correctly and stably.We propose a novel patent topic indexing method based on probabilistic latent semantic analysis.In this method,a joint model based on shared topics and specific topics is established,a mapping relation between source set and target set can be induced,and we can index the patents based on the mapping matrix.It is experimentally demonstrated that the proposed method can index patents’topics correctly and stably.This research tries to apply probabilistic latent semantic analysis into patent indexing,which is not only a positive attempt to automate patent classification,but also provides a new perspective for deep mining patent information.
作者
包翔
刘桂锋
BAO Xiang;LIU Guifeng(Institute of Scientific and Technical Information of Jiangsu University,Zhenjiang 212013,China)
出处
《情报工程》
2020年第3期15-24,共10页
Technology Intelligence Engineering
基金
江苏省高校哲学社会科学研究一般项目“主题模型在高校图书馆知识产权信息服务中的研究与实践”(2019SJA1870)
江苏省高校自然科学研究面上项目“基于多示例多标签学习及深度神经网络的专利主题分类研究”(19KJB520005)。
关键词
专利
标引
概率潜在语义
主题
Patent
indexing
probabilistic latent semantic analysis
topic