摘要
如何从海量的专利信息中挖掘出不同专利之间隐含的关联关系是很多专利管理系统迫切需要解决的问题.本文从专利文本的摘要出发,提出一种基于Apriori和引入LDA主题模型的专利文本分析方法.首先,采用Apriori算法对数据降维,挖掘关键词和主题词之间的关联规则并根据规则绘制共享主题网络图,然后使用LDA主题模型对离散化的专利-主题词矩阵进一步线性降维,并将主题聚类,得到主题细分下的高频词主题,最后结合两种分析方法的结果对专利文本进一步挖掘分析.本文所使用的方法能够有效地挖掘出专利文本数据间的关联,可以为专利间的关联推荐提供思路和应用参考.
How to dig out the implicit relationship between different patents from the vast amount of patent information is an urgent problem that many patent management systems need to solve.A patent text analysis method was proposed based on Apriori and the introduction of LDA theme model.Research started from the abstract of the patent text,introduced the Apriori algorithm,deeply explored the association rules between keywords and subject words,and then used the LDA topic model to cluster the discrete patent-subject matrix to obtain the high-frequency word theme under the topic subdivision.The experimental results show that the method used in this paper can effectively mine the association between patent text data,and can provide ideas and application references for the association recommendation between patents.
作者
艾楚涵
姜迪
吴建德
AI Chu-han;JIANG Di;WU Jian-de(Yunnan Institute of Intellectual Property,Kunming University of Science and Technology,Kunming 650500,China;Computing Center,Kunming University of Science and Technology,Kunming 650500,China;Faculty of Civil Aviation and Aeronautics,Kunming University of Science and Technology,Kunming 650500,China)
出处
《中北大学学报(自然科学版)》
CAS
2019年第6期524-530,共7页
Journal of North University of China(Natural Science Edition)