期刊文献+

一种基于时序窗口的动态热点话题提取模型

A dynamic hot topic extraction model based on time window
下载PDF
导出
摘要 针对新闻领域的专题组织进行了研究,提出了一种基于时序窗口的动态热点话题提取模型。该模型整合了热点话题的两个特点。一方面关注主题词在新闻文本中的广泛性,衡量标准为多频道播报特征项的频率综合,词频越高其广泛性越高;另一方面考虑新闻流主题词的突发性,表现为特定时间段内主题词出现频率显著异常于其它时间段。引入时序窗口进行上升和下降突发模式提取,并结合TF-DF作为主题词赋权值依据。实验结果表明,这种基于时序窗口的动态热点话题提取模型对新闻文本进行主题抽取具有很好的性能。 This paper gives a description of a study of topic organization in the news domain, and presents a novel dynamic hot topic extraction model based on the time window. The model combines two characteristics of hot topics together. One is the pervasiveness of topic terms in news texts, which is evaluated by the occurrences of the topic terms reported by different channels, and the more frequent the occurrence of the topic terms reported, the higher the pervasiveness of topic terms. The other one is the burst of topic terms in the news stream, which can be assessed by the abnormal occurrence frequencies of topic terms in a specific interval compared with other different time intervals. The time window is introduced to make burst detection and the term frequency-proportional document freqency (TF-PDF) is combined to weigh the terms. The experimental results demonstrate that this model is effective in topic extraction for news texts.
出处 《高技术通讯》 EI CAS CSCD 北大核心 2010年第6期590-595,共6页 Chinese High Technology Letters
基金 863计划(2007AA01Z132) 国家自然科学基金(60435010) 973计划(2007CB311004) 国家科技支撑计划(No.2006BAC08B06)资助项目
关键词 话题提取 时序窗口 广泛性 突发性 TF-PDF topic extraction, time window, pervasiveness, burst, TF-PDF
  • 相关文献

参考文献8

  • 1Luhn H P.The Automatic Creation of Literature Abstracts.Advances in Automatic Text Summarization.MIT Press,Cambridge,Massachusetts,USA,1956.15-22.
  • 2The 2004 topicdetection and tracking task definition and evaluation plan.http://www.nist.gov/speech/tests/tdt/tdt2004/evalplan.htm:NIST.2004.
  • 3Bun K K,Ishizuka M.Topic extraction from news archive using TF-PDF algorithm.In:Proceeding of the 3rd International Conference on Web Information Systems Engineering,Singapore,2002.73-82.
  • 4Salton G,Buckley C.Term-weighting approaches in automatic text retrieval.Information Processing and Management,1989,4(5):513-523.
  • 5Chen K Y,Luesukprasert L,Chou S T.Hot topic extraction based on timeline analysis and multidimensional sentence modeling.IEEE Trans TKDE,2007,19(8):1016-1026.
  • 6宋国杰,唐世渭,杨冬青,王腾蛟.数据流中异常模式的提取与趋势监测[J].计算机研究与发展,2004,41(10):1754-1759. 被引量:19
  • 7Porter M.An algorithm for suffix stripping.Program,1980,14(3):211-218.
  • 8Ma H F,Zhao W Z,Shi Z Z.An approach of multi-document summarization based on text relationship map.In:Proceeding of the International Conference on Advanced Intelligence,Beijing,China,2008,1536-1542.

二级参考文献4

  • 1Rakesh Agrawal, Ramakrishnan Srikant. Fast algorithms for mining association rules. The 20th Int' l Conf on Very Large Data Bases, Santiago, Chile, 1994
  • 2J Han, J Pei, Y Yin. Mining frequent Patterns without candidate generation. In: Proc of the 2000 ACM SIGMOD Int'l Conf on Management of Data. New York: ACM Press, 2000
  • 3Ramakrishnan Srikant, Rakesh Agrawal. Mining sequential patterns: Generalizations and performance improvements. In:Peter M GApers, Mokrane Bouzeghoub, Georges Gardarin, eds.In: Proc of the 5th Int'l Conf Extending Database Technology,LNCS 1057. Berlin: Springer-Verlag, 1996. 3~17
  • 4J Pei, J Han, B Mortazavi-Asl, et al. PrefixSpan: Mining sequential patterns efficiently by prefix-projected pattern growth.The 2001 Int'l Conf on Data Engineering (ICDE' 01 ),Heidelberg, Germany, 2001

共引文献18

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部