期刊文献+

基于Hadoop的网络舆情监控平台设计与实现 被引量:7

Design and Implementation of Network Consensus Monitoring System Based on Hadoop
下载PDF
导出
摘要 文中设计并实现了一种基于Hadoop的网络舆情监控系统。该系统以HDFS作为底层存储系统,在其上构建基于HBase的分布式数据库对舆情信息进行统一存储管理。首先利用基于MapReduce的分布式网络爬虫进行数据抓取,以解决单机爬虫效率低、可扩展性差等问题;其次采用Canopy结合K-means的二次聚类算法,克服单一K-means聚类算法的不足,以提高文本聚类的效率和准确度;最后实现基于查询的话题追踪策略,对热点话题进行有效跟踪分析。仿真实验表明:Canopy-Kmeans聚类方法比传统K-means方法漏报率、误报率分别降低1.24%、0.09%,最小标准代价降低1.681%。系统通过提供可视化舆情分析报告,为企业或单位及时掌握舆情热点、制定舆情策略提供科学、系统的技术支持。 A network consensus monitoring system based on Hadoop was designed and realized. The system adopts HDFS as the underlying storage system,and then it builds a distributed database based on HBase with it to realize unified storage and management on the network consensus information. Firstly,it grabs the data with the distributed web craw ler based on MapReduce to solve the problems of lowefficiency and poor expansibility of single craw ler. Then it uses the secondary clustering algorithm with Canopy combined with K-means,which can overcome the shortages of single K- means clustering algorithm and could improve the efficiency and precision of text clustering. Finally,it could realize the topics tracking strategy based on query,also could be effective track and analysis of hot topics. The simulation experiment results show that compared with the traditional methods,the false negative and false positive of Canopy- Kmeans clustering method is lower at 1. 24% and 0. 09% respectively,the minimum standard price is lower at 1. 681%. Through providing the visualized analysis of network consensus,the system proposed could provide scientific and systematical technology support for enterprises and scientific institutions to learn the hot network consensus and make network consensus strategy.
出处 《计算机技术与发展》 2016年第2期144-149,共6页 Computer Technology and Development
基金 山东省科学院青年基金项目(2013QN036) 山东省科技发展计划(2013GGX10127 2014GGX101013)
关键词 HADOOP MAPREDUCE 舆情监控 文本聚类 热点发现 话题跟踪 Hadoop MapReduce monitoring public opinion text clustering hot topic founding topic tracking
  • 相关文献

参考文献9

  • 1陈彦舟,曹金璇.基于Hadoop的微博舆情监控系统[J].计算机系统应用,2013,22(4):18-22. 被引量:27
  • 2王宏宇.Hadoop平台在云计算中的应用[J].软件,2011,32(4):36-38. 被引量:41
  • 3Owen S,Anil R, Dunning T, et al. Mahout in action [ M ]. [ s. 1. ] :Manning Publications,2011.
  • 4McCaUum A,Nigam K,Ungar L H. Efficient clustering of high -dimensional data sets with application to reference matching [C]//Proc of the 6th ACM SIGKDD. [ s. 1. ] : ACM,2000: 169-178.
  • 5MacQueen J. Some methods for classification and analysis of multivariate observations[ C ]//Proc of the 5th Berkeley sym- posium on mathematical statistics and probability. California: University of California Press, 1967:281-287.
  • 6Schultz J, Liberma M. Topic detection and tracking using IDF- weighted cosine coefficient [ C]//Proceedings of the DARPAbroadcast news workshop. Herndon : [ s. n. ], 1999 : 189 - 192.
  • 7Shvachko K, Kuang H, Radia S, et al. The Hadoop distributed file system[ C]//Proc of IEEE 26th symposium on mass stor- age systems and technologies. [ s. 1. ] :IEEE,2010.
  • 8Dean J, Ghemawat S. MapReduce simplified data processing on large clusters [ J ]. Communications of the ACM, 2008,51 (1) :107-113.
  • 9Vashishtha H, Stroulia E. Enhancing query support in HBase via an extended coprocessors framework [ C ]//Proceedings of the European conference on towards a service-based internet. [ s. 1.] : [ s. n. ] ,2011:75-87.

二级参考文献9

共引文献66

同被引文献43

引证文献7

二级引证文献10

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部