期刊文献+

基于LDA与新兴主题特征分析的新兴主题探测研究 被引量:59

Detection of Emerging Topics Based on LDA and Feature Analysis of Emerging Topics
下载PDF
导出
摘要 本文尝试基于LDA主题模型探测文档集中的新兴主题.本文采用主题的新颖度、发文量指标,并引入被引量,得到新兴主题的特征指标,在此基础上对主题在进入成熟阶段前各个时期的特征进行了分析.并提出了针对上述新兴主题探测指标,基于LDA主题模型抽取文档的语义主题词,利用文档-主题矩阵建立主题和文档的映射,得到主题的新颖度指标和发文量指标、被引量指标,并形成新兴主题探测表格和探测曲线VDP,从而探测出新兴主题,并对新兴主题VDP与基线VDP距离的发展趋势进行预测,根据拟合的曲线对其进行分析,得到最值得关注的新兴主题. This paper proposes one method to detect the emerging topics of the courpus based on the LDA model. In this paper, the topic novelty index, published volume index, and the cited volume index are all used to get the feature index of the emerging topic ; and based on it,this paper analyses the feature of each period before the topics enter into the mature period. This paper also proposes the detection index for the emerging topic, and extracts the semantic topical words of the documents using the LDA model,and construct the mapping from topic to document using the document-topic matrix, and based on the mapping,gets the novelty index,published volumn index and cited volume index of the topic respectively,and forms the detection table of the emerging topic and detection curve VDP,and gets the emerging topics furtherly. And based on the distance between the VDP of the emerging topics and the baseline VDP, this paper analysises and predicted the trend using the fitted curve to find most attractive emerging topics.
出处 《情报学报》 CSSCI 北大核心 2014年第7期698-711,共14页 Journal of the China Society for Scientific and Technical Information
基金 中国科学院西部之光联合学者项目“基于计算情报方法的甘肃省战略新兴产业技术创新竞争与发展研究” 国家自然科学基金项目(项目编号:71373260)的研究成果之一
关键词 隐狄利克雷分布 主题模型 新兴主题 主题特征 新颖度指标 发文量指标 被引量指标 生命周期 LDA( Latent Diriehlet Allocation) , topic model, emerging topic, topic feature, novelty index, published volume index, citation volume index, life cycle
  • 相关文献

参考文献21

  • 1Wang Xing, McCallum A, Wei Xing. Topical n-grams: Phrase and topic discovery, with an application to information retrieval [ C ]//Data Mining, 2007. ICDM 2007. Seventh IEEE International Conference on. IEEE, 2007 : 697-702.
  • 2Glanzel W. Bibliometric methods for detecting and analysing emerging research topics [ J ]. E1 profesional de la informaci6n, 2012, 21 (2) : 194-201.
  • 3Wei Xing, Croft W B. LDA-based document models for ad-hoc retrieval [ C ]//Proceedings of the 29th annual international ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 2006: 178-185.
  • 4Bun K K, Ishizuka M. Emerging topic tracking system [ C ]//Advanced Issues of E-Commerce and Web-Based Information Systems, WECWIS 2001, Third International Workshop on. IEEE, 2001 : 2-11.
  • 5贺亮,李芳.基于话题模型的科技文献话题发现和趋势分析[J].中文信息学报,2012,26(2):109-115. 被引量:26
  • 6Jordan M I, Ghahramani Z, Jaakkola T S, et al. Anintroduction to variational methods for graphical models [J]. Machine learning, 1999, 37(2): 183-233.
  • 7Putthividhy D,Attias H T, Nagarajan S S. Topic regression multi-modal latent dirichlet allocation for image annotation [ C ]//Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on. IEEE, 2010 : 3408-3415.
  • 8Griffiths T. Gibbs sampling in the generative model of latent dirichlet allocation [ EB/OL ]. [ 2013-06-12 ]. http://citeseerx, ist. psu. edu/viewdoc/download? doi = 10,1,1. 138. 3760&rep = repl &type = pdf.
  • 9Ueda N, Saito K. Parametric mixture models for multi- labeled text [ C ]//Advances in neural information processing systems ,2002 : 721-728.
  • 10Tu Yi Ning, Seng J L. Indices of novelty for emerging topic detection [ J ]. Information Processing&Management, 2012, 48(2): 303-325.

二级参考文献28

  • 1王爱国,何庆梅.基于生命周期理论的高技术企业技术创新战略研究[J].山东理工大学学报(社会科学版),2006,22(3):35-39. 被引量:7
  • 2MEADE P T, RABELO L. The technology adoption life cycle attractor: Understanding the dynamics of high-tech markets [J]. Technological Forecasting and Social Change, 2004, vol.71; no.7: 667-684.
  • 3KONTOSTATHIS A, GALITSKY L M, POTTENGER W M, ROY S, PHELPS D J. A Survey of Emerging Trend Detection in Textual Data Mining [M]//BERRY M W. A Comprehensive Survey of Text Mining. Springer-Verlag, 2003: 185 -2 19.
  • 4HOANG L M. Emerging Trends Detection from Scientific Online Documents [D]. Japan Advanced Institute of Science and Technology, 2006.
  • 5[EB/OL]. [2010-3-5]. http://www.vocgrid.org/vocabulary/.
  • 6史新.科技词汇知识的网络服务模式研究与原型实现[D].北京:中国科学技术信息研究所,2009.
  • 7张运良,朱札军,薛春香,乔晓东.汉语科技词系统构建分析报告[R].中国科学技术信息研究所,2009.
  • 8XU S, ZHU L J, QIAO X D, XUE C X. A Novel Approach for Measuring Chinese Terms Semantic Similarity based on Pairwise Sequence Alignment[C]. The Fifth International Conference on Semantics, Knowledge and Grid, Zhuhai, China, 2009.
  • 9S.Deerwester,S.Dumais,T.Landauer,et al. Indexing by Latent Semantic Analysis[J].Journal of the American Society of Information Science, 1990, 41(6):391-407.
  • 10T.Hofmann. Probabilistic Latent Semantic Indexing[C]//Proceedings of the Twenty-Second Annual International SIGIR Conference,1999.

共引文献49

同被引文献708

引证文献59

二级引证文献671

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部