期刊文献+

基于评论修正的博客聚类算法 被引量:2

Public Blog Clustering Algorithm Based on Revision by Comments
下载PDF
导出
摘要 博客聚类是处理博客信息的有效方法,提出基于评论修正的博客页面聚类算法.首先分析博客所包含的信息层次结构,然后利用博客页面的通用属性构建博客属性模型,基于博客属性模型对博客页面进行聚类,并且在初次聚类的基础上利用博文的评论对聚类结果进行修正.采用通用的熵和纯净度来衡量聚类结果,根据评论利用方式的不同,设计了两种实验方案:一个实验直接使用评论参与聚类,另一个将评论作为聚类后的修正手段.实验结果对比表明,在大多数情况下,利用评论作为修正手段的聚类效果要优于直接利用评论参与聚类. Public blog clustering is an effective way to process blog information.A public blog clustering algorithm was therefore proposed,based on the revision by comments.Analyzing the information hierarchy of public blog,a public blog attribute model based on the general attributes of blog pages was developed as a basis on which the public blog was clustered.Then,after the initial clustering,the comments on the clustered public blog were taken in to revise the clustered blog.The clustered results were evaluated with entropy and purity,and two testing schemes were designed according to different ways of taking the comments in.One was making the comments on public blog participate in clustering process directly,the other was making use of the comments after clustering to play the role of revision.Testing results showed that,in most cases,the latter was more effective than the former.
出处 《东北大学学报(自然科学版)》 EI CAS CSCD 北大核心 2010年第6期782-785,共4页 Journal of Northeastern University(Natural Science)
基金 国家自然科学基金资助项目(60773218) 国家高技术研究发展计划项目(2009AA01Z122) 辽宁省科学技术基金资助项目(20072031)
关键词 博客 聚类 博客评论 修正 聚类算法 public blog clustering comment on blog revision clustering algorithm
  • 相关文献

参考文献11

  • 1Zhang W, Yu C T, Meng W Y. Opinion retrieval from blogs [ C]//CIKM. Lisboa, 2007 : 831 - 840.
  • 2Agarwal N, Liu H. Blogosphere: research issues, tools, and applications[J]. SIGKDD Explorations, 2008,10 ( 1 ) : 18 -31.
  • 3Ishida K. Extracting spam blogs with co-citation clusters[C] //WWW2008. Beijing, 2008:1043 - 1044.
  • 4Agarwal N, Oliveras M G, Liu H. Clustering blogs with collective wisdom [ C]//ICWE. New York, 2008:336- 339.
  • 5Brooks C H, Montanez N. Improved annotation of the blogosphere via autotagging and hierarchical clustering [C]// WWW2006. Edinburgh, 20061625-632.
  • 6Li B B, Xu S T, Zhang J. Enhancing clustering blog documents by utilizing author/reader comments[C]//ACM Southeast Regional Conference. New York, 2007:94-99.
  • 7Bansal N, Chiang F, Koudas N. Seeking stable clusters in the blogosphere[ C]//VLDB. Vienna, 2007 : 806 - 817.
  • 8Sun A, Suryanto M A, Liu Y. Blog classification using tags: an empirical study[C]//ICADL. Hanoi, 2007 : 307 - 316.
  • 9韩家炜.数据挖掘概念与技术[M].范明,孟小峰,译.北京:机械工业出版社,2006.
  • 10Han Jia- wei, Kamber M. Data mining concepts and techniques[M]. Translated by Fan Ming, Meng Xiao-feng. Beijing:China Machine Press, 2006:467- 483.

共引文献4

同被引文献27

  • 1肖欣延,张东站,高君杰,薛永生.一种新的Web检索结果聚类方法[J].计算机研究与发展,2007,44(z2):79-83. 被引量:3
  • 2王曰芬,宋爽,苗露.共现分析在知识服务中的应用研究[J].现代图书情报技术,2006(4):29-34. 被引量:59
  • 3张树良,冷伏海.基于文献的知识发现的应用进展研究[J].情报学报,2006,25(6):700-712. 被引量:47
  • 4王曰芬,宋爽,卢宁,朱烨.共现分析在文本知识挖掘中的应用研究[J].中国图书馆学报,2007,33(2):59-64. 被引量:44
  • 5Lin Y,Iin H F, Song J, et al. Social annotation in queryexpansion : a machine learning approach [ C ]//Special InterestGroup on Information Retrieval.Beijing,2011:405 -414.
  • 6Cao G H,Nie J Y,Gao J F,et al. Selecting good expansionterms for pseudo-relevance feedback [ C ]//Special InterestGroup on Information Retrieval. Singapore,2008 :243 -250.
  • 7Zhai C X. Beyond search: statistical topic models for textanalysis [ C ]//Special Interest Group on InformationRetrieval. Beijing,2011:3 -4.
  • 8Lee K S,Croft W B,Allan J. A cluster-based resamplingmethod for pseudo-relevance feedback [ C ] //Special InterestGroup on Information Retrieval. Singapore ,2008 :235 -242.
  • 9Inna G K, Oren K. Cluster-based query expansion [ C ]//Special Interest Group on Information Retrieval. Boston,2009:646 -647.
  • 10Song W, Zhang Y,Liu T,et al. Bridging topic modeling andpersonalized search [ C ] //The 23 rd International Conferenceon Computational Linguistics. Beijing,2010: 1167 -1175.

引证文献2

二级引证文献4

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部