摘要
目前随着新浪微博的普及,很多相关的研究由此展开。使用模拟登录新浪微博的方法爬取7万多用户的200多万条微博。根据用户的互动情况使用熵力模型绘制社交网络图,发现用户的互动特点。根据每位用户的微博内容,从词出发,使用TF-IDF算法计算出词权值,使用K-means算法进行聚类,找出具有不同特点的群体,分析每个群体的关键词。实验结果表明,所提出的方法能够有效挖掘用户群体。
At present, with the popularity of Sina Weibo, many related researches have carried out. Simulates the landing of Sina micro-blog and crawled about 2 million micro-blogs of more than 70 thousand users. Uses the entropy force model to draw the social network map according to the user interaction, finds the user interaction characteristics. According to each user's micro-blog content, in view of word base, uses the TF-IDF algorithm to calculate the word weight value, uses the K-means algorithm to cluster, finds out the key words with common groups, and analyzes the characteristics of each group.
作者
苟良
GOU Liang(College of Information Science and Engineering,Xinjiang University,Urumqi 83004)
基金
国家自然科学基金项目(No.61561047)