期刊文献+

一种基于划分的周期性话题挖掘方法研究

Research on A Periodic Topic Ming Method Based on Partition
下载PDF
导出
摘要 周期性话题挖掘是目前数据挖掘领域的研究热点之一,针对当前绝大部分研究只限于时间序列数据库、无法直接应用于文本数据的不足,提出了一种基于划分的周期性话题挖掘方法(PTMP),首先,将话题划分为周期性话题、背景话题和突发性话题,然后,将每个周期性话题的时标分布建模为混合高斯分布,为了缓解背景噪声问题,通过均匀分布生成背景话题的时标,用高斯分布来生成突发话题的时标,然后通过将该混合模型根据时标文本数据进行调整,从而发现周期性话题及其时间分布。最后,收集了包括研讨会、DBLP和Flickr在内的多个代表性数据集,验证方法的有效性。 Periodic topic mining is a hot problem of current research in the data mining region. Aiming at the disadvantages ofmost existing studies which are limited to time series database and cannot be applied on text data directly, this paper proposes aperiodic topic mining method based on partition, firstly, topics can be classified into three types: periodic topics, background top-ics, and bursty topics, we model the distribution of time-stamps for each periodic topic as a mixture of Gaussian distributions, inorder to alleviate the problem of background noises, the time-stamps of the background topics are generated by a uniform distribu-tion, the time-stamps of the bursty topics are generated from a Gaussian distribution, and then By fitting such a mixture model totime-stamped text data, we can discover periodic topics along with their time distributions. To show the effectiveness of our model,we collect several representative datasets including Seminar, DBLP and Flickr.
作者 邓定胜
出处 《微型电脑应用》 2014年第8期21-26,共6页 Microcomputer Applications
关键词 周期性话题 数据挖掘 混合高斯分布 噪声 时标 Periodic Topic Data Ming Mixture of Gaussian Distributions Noise Time-Stamps
  • 相关文献

参考文献9

  • 1Vlachos M, Yu P, Castelli V. On periodicity detection and structural periodic similarity[C]. SIAM International Conference on Data Mining. 2005:449-460.
  • 2Bathoorn R, Welten M, Richardson M, et al. Frequent episode mining to support pattern analysis in develop- mental biology[M]. Pattern Recognition in Bioinformatics Springer Berlin Heidelberg, 2010:253-263.
  • 3Chen L, Roy A. Event detection from flickr data throughwavelet-based spatial analysis[C]. Proceedings of the 18th ACM conference on Information and knowledge management. ACM, 2009:523-532.
  • 4Mei Q, Liu C, Su H, et al. A probabilistie approach to spatiotemporal theme pattern mining on weblogs[C]. Proceedings of the 15th international conference on World Wide Web. ACM, 2006:533-542.
  • 5Wang X, Zhai C X, Hu X, et al. Mining correlated bursty topic patterns from coordinated text streams[C]. Proceed- ings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2007: 784-793.
  • 6Blei D M. Probabilistic topic models [J]. Communica- tions oftheACM, 2012, 55(4): 77-84.
  • 7Iwata T, Yamada T, Sakurai Y, et al. Online multiscale dynamic topic models[C]. Proceedings of the 16th ACM SIGKDD international conference on Knowledge discov- ery and data mining. ACM, 2010:663-672.
  • 8Lahiri M, Berger-Wolf T Y. Periodic subgraph mining in dynamic networks [J]. Knowledge and information sys- tems, 2010, 24(3): 467-497.
  • 9Murata M, Toda H, Matsuura Y, et al. detecting periodic changes in search intentions in a search engine[C]. Pro- ceedings of the 19th ACM international conference on Information and knowledge management. ACM, 2010: 1525-1528.

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部