期刊文献+

基于频繁模式挖掘的网络舆情热点发现技术研究 被引量:6

Research of Network Public Opinion Hotspots Detection Based on Frequent Items Mining
下载PDF
导出
摘要 舆情热点发现是网络舆情系统的基本问题。通过分析传统舆情热点发现技术实时性和精准性差、算法复杂度高等问题,本文提出了基于频繁模式挖掘的网络舆情热点发现技术。基于网络数据流分布符合细尾特征,设计了一种高效的数据流频繁模式挖掘算法ILC,该算法使用差量窗口裁剪策略将每个数据项的处理时间降到了常数级别。对使用ILC算法的网络舆情热点发现技术进行实验,实验结果表明,该技术能够在高速的网络流量中实时精准地识别出舆情热点话题,正确率在90%以上。 Network public opinion hotspots detection is a basic problem of network public opinion system. Traditional detection technology can't find hotspots instantly and usually cost too much. We found that data distribution meets small tails property,so proposed an effective data stream frequent items mining algorithm,ILC,using differ window prune strategy,reduce time cost to O (1). Using it,public opinion hotspots can be found out on web data stream. Experimental results show that this method can detect 90% network public opinion hotspots from high speed network stream instantly and accurately.
出处 《微计算机信息》 2010年第36期35-37,共3页 Control & Automation
基金 基金申请人:李斌 项目名称:网络危机响应系统关键技术研究 基金颁发部门:中华人民共和国工业和信息化部(2007A47)
关键词 网络舆情 舆情热点发现 频繁模式挖掘 窗口裁剪 network public opinion hotspots detection frequent items mining window prune
  • 相关文献

参考文献6

  • 1B. Babcock, M. Datar, R. Motwani, L. Callaghan. Maintaining Variance and k-Medians over Data Stream Windows. ACM PODS, 2003:5-14.
  • 2Mark Levene, George Loizou. Zipf's Law for Web Suffers. Knowledge and Information Systems 2001 (3):120-129.
  • 3杜阿宁,程晓明.网络流量分析中的频繁项监测技术研究[J].通信学报,2006,27(2):9-15. 被引量:3
  • 4孟凡荣,宋春景,张磊.一种新的多层频繁模式挖掘算法[J].微计算机信息,2009,25(3):179-181. 被引量:1
  • 5G. S. Manku, R. Motwani. Approximate Frequency Counts Over Data Streams. Proceedings of the 28th International Conference on VLDB, Hong Kong, China, 2002.8 : 346 -357.
  • 6M. Sahami, T. Heilman. A web-based kernel function for measuring the similarity of short text snippets. In Proceedings of World Wide Web 2006, 377-386.

二级参考文献20

  • 1金澈清,钱卫宁,周傲英.流数据分析与管理综述[J].软件学报,2004,15(8):1172-1181. 被引量:161
  • 2李超,余昭平.基于最大模式的关联规则挖掘算法研究[J].微计算机信息,2006(02X):164-165. 被引量:20
  • 3Han J,Fu Yj. Discovery of Multiple-Level Association Rules from Large Databases [C].Proc.of the 1995 Int.Conf.Very Large DataBases, Zurich, Switzerland, 1995:420-431.
  • 4Han j, Pei J, Yin J.Mining Frequent Patterns Without Candidate Generation [C].ACM SIGMOD Int.Conference on Management of Data,2000:1-12.
  • 5ESTAN C,VARGHESE G.New directions in traffic measurement and accounting:Focusing on the elephants,ignoring the mice[J].ACM Transactions on Computer Systems,2003,21 (3):270-313.
  • 6GILBERT A C,KOTIDIS Y,MUTHUKRISHNAN S.QuickSAND:quick summary and analysis of network data[EB/OL].ftp://dimacs.rutgers.edu/pub/dimacs/Technical Reports/TechReports/2001/2001-43.ps.gz,2001.
  • 7DUFFIELD N,LUND C,THORUP M.Charging from sampled network usage[A].SIGCOMM Intemet Measurement Workshop[C].San Francisco,USA,2001.245-256.
  • 8ABHISHEK K,MINOH S,JUN X.Data streaming algorithms for efficient and accurate estimation of flow size distribution[J].ACM SIGMETRICS Performance Evaluation Review,2004,32(1):177-188.
  • 9Cisco Netflow[EB/OL].http://www.cisco.com/en/US/products/ps6350/products_configuration_guide_chapter09186a0080438d7c.html,2005.
  • 10MARK F,STEVE R.The OSU flow-tools package and CISCO netflow logs[A].Proceedings of the 14th Conference on Systems Administration(LISA)[C].Berkeley,CA,2000.291-304.

共引文献2

同被引文献105

引证文献6

二级引证文献57

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部