摘要
最近几年,以微博为首的社交网络迅猛发展,这些平台上包含了网民对于时事热点的观点,对生活和人际关系的看法等大量有价值的信息和资源。由于微博数据非常庞大又难以获取等困难,如何有效地对社交网络进行数据挖掘,是近两年数据挖掘研究的重点和热点。本工作设计和实现了一个基于Hadoop的并行社交网络挖掘系统,包含了分布式数据库,并行爬虫,并行数据处理和并行数据挖掘算法集,可以有效地获取和分析挖掘海量的社交网络数据,为社团分析,用户行为分析,用户分类,微博分类等工作提供支持。
In recent years, the social networks such as microblogging have been developed really well. These Platforms contains the views of hotspot of current events from millions of users and the relationship between them. These information are quiet valuable and important. The problem that how to do work about microblogging dataming has been a research hotpot in recent 2 years because of the microblogging mess data. In this paper’s work, we designed and implemented a parallel a social network datamining system based on Hadoop. This System include a distributed database, parallel crawler, parallel data processing and parallel datamining algorithms that efifciently access and analyze vast amounts of social network data and be a support for society analysis, user behavior analysis, user classiifcation.
出处
《软件》
2013年第12期127-131,共5页
Software