期刊文献+

面向专利的主题短语提取 被引量:4

Topical phrase mining for patent
下载PDF
导出
摘要 在中文专利主题挖掘研究中,针对基于单词的传统主题模型结果可解释性较差问题,提出一种融合词向量和Generalized Pólya urn (GPU)的改进模型GW_PhraseLDA。根据专利文本特点,使用BLSTM-CRF模型进行专利短语抽取,利用训练好的词向量生成先验知识。在Gibbs采样的迭代过程中,利用GPU策略提升语义相关短语在同一主题下的概率。在中文专利文本上的实验结果表明,所提模型能够有效提高专利主题生成质量,相比传统的主题模型更具可解释性和判别性。 In the study of Chinese patent topic mining,an improved model GW _ PhraseLDA,which combined word vector and Generalized Pólya urn (GPU),was proposed to solve the problem of poor interpretability of the result of the traditional topic model based on the word.According to the characteristics of the patent text,the BLSTM-CRF model was used to extract the patent phrases.The trained word vectors were used to generate prior knowledge.In the iterative process of Gibbs sampling,the GPU strategy was used to improve the probability of semantic related phrases under the same topic.Results of experiments on Chinese patent texts show that the model proposed can effectively improve the quality of patent topic,which is much more interpretable and discriminant than traditional topic models.
作者 马建红 姬帅 刘硕 MA Jian-hong;JI Shuai;LIU Shuo(School of Computer Science and Engineering,Hebei University of Technology,Tianjin 300401,China)
出处 《计算机工程与设计》 北大核心 2019年第5期1365-1369,1382,共6页 Computer Engineering and Design
关键词 专利挖掘 短语抽取 双向长短时记忆网络 条件随机场 主题模型 patent mining term extraction bidirectional long short-term memory conditional random fields topic model
  • 相关文献

参考文献4

二级参考文献46

  • 1杨祖国,李文兰.中国专利被专利文献引用的主题分析[J].情报科学,2005,23(12):1845-1851. 被引量:14
  • 2Blei D M,Ng A Y,Jordan M I. Latent dirichlet allocation [ J]. Journal of Machine Learning Research, 2003, 3 : 993 -1022.
  • 3Wang X, McCallum A. Topics over time: a non-Markov continuous-time model of topical trends[ C ]//Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2006 : 424- 433.
  • 4Teh Y W, Jordan M I, Beal M J, et al. Hierarchical Dirichlet processes [ J ]. Journal of the American Statistical Association, 2006, 101 (476).
  • 5Lancichinetti A, Fortunato S. Consensus clustering in complex networks[ J]. Scientific Reports, 2012, 2.
  • 6Havre S,Hetzler E, Whitney P, et al. Themeriver: Visualizing thematic changes in large document collections [ J ].Visualization and Computer Graphics, IEEE Transactions on, 2002, 8 ( 1 ) : 9-20.
  • 7Wei F, Liu S, Song Y, et al. Tiara: a visual exploratory text analytic system [ C ]//Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2010: 153-162.
  • 8Cui W, Liu S, Tan L, et al. Textflow: Towards better understanding of evolving topics in text[ J]. Visualization and Computer Graphics, IEEE Transactions on, 2011, 17 (12) : 2412-2421.
  • 9Salton G, Wong A, Yang C S. A vector space model for automatic indexing [ J]. Communications of the ACM, 1975, 18(11) : 613-620.
  • 10Jo Y, Lagoze C, Giles C L. Detecting research topics via the correlation between graphs and texts [ C ]// Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2007 : 370-379.

共引文献76

同被引文献39

引证文献4

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部