期刊文献+

基于标签的微博人脉网络挖掘算法和结构分析 被引量:2

Mining Algorithm and Structural Analysis of Microblog Interpersonal Relationship Network Based on Tag
下载PDF
导出
摘要 针对互联网微博业务的广泛应用及其对大数据挖掘和分析的影响,提出一种基于标签的微博人脉网络挖掘算法。分析该网络的结构特征,利用微博用户标签,在模糊匹配过程中计算词语之间的匹配度时,主要考虑词语语素、次序和词长3个因素。为弱化以不同用户为起点对算法准确率的影响,分别以普通用户和名人用户为起点用户,挖掘微博人脉网络数据。同时,研究微博人脉网络的结构特性,通过分析发现微博人脉网络同时具有小世界和无标度特性。实验结果表明,运用该算法对名人用户和普通用户朋友中对IT感兴趣的人进行挖掘的误差率是可接受的。其中,挖掘10个名人用户朋友时算法的平均误差率为14.08%,挖掘10个普通用户朋友时算法的平均误差率为10.63%。 For the widespread use of microblog business and the impact on data mining techniques, a mining algorithm of microblog interpersonal relationship network is proposed based on the fuzzy matching of tag, and the characteristics of the network are analyzed. Use the tag of the users, the algorithm mainly considers word morpheme, order, and word length to calculate the match degree of the words when matching the tag. For weakening the influence that using different users as a starting point may have different result, ordinary users and celebrities as a starting point separately are used. At the same time, the structural characteristics of the network are studied, and the analysis results show that the network has small-world and scale-free properties. The results show that the mining error rate of celebrities and common users friends who are interested in IT. When mining 10 celebrity users’ friends, the average error rate of the algorithm is 14.08%, and 10.63%for common users.
作者 王莎 张连明
出处 《计算机工程》 CAS CSCD 2014年第5期7-11,共5页 Computer Engineering
基金 国家自然科学基金资助项目(60973129) 广东省自然科学基金资助项目(S2011010000812)
关键词 标签 微博 人脉网络 模糊匹配 数据挖掘 结构特征 tag microblog interpersonal relationship network fuzzy matching data mining structural characteristics
  • 相关文献

参考文献4

二级参考文献45

  • 1欧健文,董守斌,蔡斌.模板化网页主题信息的提取方法[J].清华大学学报(自然科学版),2005,45(S1):1743-1747. 被引量:70
  • 2王源,吴晓滨,涂从文,刘滨,章元峰,王金娥.后控规范的计算机处理[J].现代图书情报技术,1993(2):4-7. 被引量:30
  • 3周立柱,林玲.聚焦爬虫技术研究综述[J].计算机应用,2005,25(9):1965-1969. 被引量:153
  • 4周荣庭,郑彬.分众分类:网络时代的新型信息分类法[J].现代图书情报技术,2006(3):72-75. 被引量:57
  • 5-.现代汉语词典[M].北京:商务印书馆,1994..
  • 6宋明亮 张琪玉.报纸文献机助自由标引研究及对后控制词表动态维护的思维:硕士论文[M].空军政治学院,1994,6..
  • 7吴志强 侯汉清.经济信息检索后控制词表的研制:硕士论文[M].南京:南京农业大学,1999,6..
  • 8朱毅华 侯汉清.智能搜索引擎中同义词识别算法的研究:硕士论文[M].南京:南洋农业大学,2001,6..
  • 9李朝阳 侯汉清.汉语科技同义词字面相似度测试[J].理论学术年刊,1998,.
  • 10学术网络书签工具--citeuLike介绍[EB/OL].[2009-06-20].http://www.xxc.idv.tw/blog/xxc/webtryit/academic_social_1.html.

共引文献169

同被引文献27

  • 1姜望琪.Zipf与省力原则[J].同济大学学报(社会科学版),2005,16(1):87-95. 被引量:144
  • 2Yan Xiaohui, Guo Jiafeng, Lan Yanyan, et al. A Biterm Topic Model for Short Texts[ C ]//Proceedings of the 22nd International Conference Companion on World Wide Web. Rio de Janeiro, Brazil: IW3C2 Press, 2013 : 1445-1456.
  • 3Blei D M, Ng A Y, Jordan M I. Latent Dirichlet Allocation [J]. Journal of Machine Learning Research, 2003,3( 1 ) :993-1022.
  • 4Zhao Xin,Jiang Jing, He Jing, et al. Comparing Twitter and Traditional Media Using Topic Models [ C ]// Proceedings of the 33rd European Conference on IR Research. Berlin, Germany: Springer-Verlag, 2011: 338-349.
  • 5Hong Liangjie, Dom B, Gurumurthy S, et al. A Time- dependent Topic Model for Multiple Text Streams [ C ]// Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, USA : ACM Press, 2011 : 832-840.
  • 6Deerwester S C, Dumais S T, Landauer T K, et al. Indexing by Latent Semantic Analysis [ J ]. Journal of American Society for Information Science, 1990,41 (6) : 391 407.
  • 7Griffiths T L, Steyvers M. Finding Scientific Topics[ J]. National Academy of Sciences of the United States of America ,2004,101 ( S1 ) :5228-5235.
  • 8Minka T P, Lafferty J. Expectation-propagation for the Generative Aspect Model [ C ]//Proceeding of the 18th Conference on Uncertainty in Artificial Intelligence. Boston, USA : AUAI Press ,2002 : 352-359.
  • 9Blei D M,Lafferty J D. Correlated Topic Models[ C]// Proceedings of NIPS ' 05. Cambridge, USA : MIT Press, 2005 : 147-155.
  • 10Steyvers M, Smyth P, Griffiths T. Probabilistic Author- topic Models for Information Discovery [ C ]// Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.New York, USA : ACM Press, 2004 : 306-315.

引证文献2

二级引证文献16

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部