期刊文献+

基于MapReduce的Web标签SOINN聚类算法 被引量:3

MapReduce-based SOINN Clustering Algorithm for Web Tag
下载PDF
导出
摘要 Web标签有助于用户根据自己特定的兴趣完成信息资源的分类、组织和检索。然而,正是由于协同标记系统特有的公开性、自由化的特点,采用其对信息资源进行描述、组织、分类和检索,存在着信息描述不精确、标签组织混乱和标签语意模糊等问题。在此背景下提出了3种基于特征向量表示法(FVR)的Web标签SOINN聚类算法:基于资源的特征向量表示法、基于其他共现标签的特征向量表示法和基于全集共现标签的特征向量表示法。同时应用MapReduce框架将SOINN算法进行并行化。实验表明,当类中心数量超过2000时,3种分布式聚类FVR算法的召回率和准确度优于原始算法,可获得很好的加速比。从而证明此分布式聚类算法具有很好的可扩展性,可以用于更为海量的Web日志聚类分析系统。 Web tag helps users to classify,organize and search internet resources according to their interests.Tag clustering can help to solve problems caused by openness and freedom of Web tag system,such as inaccurate information description,disorganized tags,ambiguity,and so on.Three tag feature vector representation (FVR) methods were presented which are resource-based FVR,other tag co-occurrence FVR and total tag co-occurrence FVR,can all apply to SOINN clustering algorithm.SOINN clustering can be parallelized by MapReduce model.Experiments show that accuracy and recall rate of three tag FVR are superior to original tag co-occurrence FVR and tag SOINN clustering by MapReduce owns optimum performance when the number of class center is more than 2000.The experimental results prove that distributed clustering algorithms proposed in this paper have good scalability which can be applied to more massive cluster Web tag analysis system.
出处 《计算机科学》 CSCD 北大核心 2014年第12期197-201,共5页 Computer Science
基金 国家自然科学基金可信软件研究课题(61272174) 中央高校基本科研业务费专项基金(DUT14QY32)资助
关键词 Web标签聚类 SOINN算法 MAPREDUCE Web tag clustering SOINN algorithm MapReduce
  • 相关文献

参考文献12

  • 1Kamel B, Wheeler S. The emerging Web 2. 0 social software: an enabling suite of sociable technologies in health and health care educationl [J]. Health Information &. Libraries Journal, 2007,24(1) ..2-23.
  • 2Li Y,An J. Analysis on the Online Public ()pinion Management in the Context of Web 2. 0~[C]//Proceedings of 2010 Interna- tional Conference on Public Administration(6th). 2010:418-422.
  • 3Kipp M E I,Campbell D G. Patterms and inconsistencies in col- laborative tagging systems: An examination of tagging prattices [C]//Proceedings of the American Society for Information Sci- ence and Technology. 2006:1-18.
  • 4Grigory B, Philipp K, Frank S. Automated Tag Clustering: Im- proving search and exploration in the tag space[C]//Collabora- tive Web Tagging Workshop at www 2006. Edinburgh, Scot- land, 2006 : 15-33.
  • 5Paul H, Hector G. Collaborative Creation of Communal Hierar- chical Taxonomies in Social Tagging Systems [R]~. Stanford, 2006.
  • 6Ramage D, Heymann P, Manning C D, et al. Clustering the tagged web[C] // International Conference on Web Scarch and Web r)ata Mining(WSDM). ACM, 2009 .. 54-63.
  • 7Gunarathne T,Zhang B J,Wu T L, et al. Scalable parallel com- puting on clouds using Twister4Azure iterative MapReduce[J]. Future Generation Computer Systems, 2013,29 (4) : 1035-1048.
  • 8Furao S, Hui Y,Sakurai K,et al. An incremental online semi su- pervised ac-tive learning algorithm based on self-organizing in- cremental neural network [J]. Neural Computing ~,. Applica- tions, 2011,20(7) : 1061-74.
  • 9Kawewong A, Honda Y,Tsuboyama M, et al. Reasoning on the Self-organizing Incremental Associative Memory for Online Ro- bot Path Planning[J]. IEICE transactions on information and systems, 201 O, 93(3) : 569-582.
  • 10Ching-man A, Nicholas G, Nigel S. Contextualising Tags in Col- laborative Tagging Systems[C]~//20th ACM Conference on Hy- pertext and Hypermedia. ACM, 2009 : 251-260.

同被引文献29

引证文献3

二级引证文献15

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部