摘要
博客聚类是处理博客信息的有效方法,提出基于评论修正的博客页面聚类算法.首先分析博客所包含的信息层次结构,然后利用博客页面的通用属性构建博客属性模型,基于博客属性模型对博客页面进行聚类,并且在初次聚类的基础上利用博文的评论对聚类结果进行修正.采用通用的熵和纯净度来衡量聚类结果,根据评论利用方式的不同,设计了两种实验方案:一个实验直接使用评论参与聚类,另一个将评论作为聚类后的修正手段.实验结果对比表明,在大多数情况下,利用评论作为修正手段的聚类效果要优于直接利用评论参与聚类.
Public blog clustering is an effective way to process blog information.A public blog clustering algorithm was therefore proposed,based on the revision by comments.Analyzing the information hierarchy of public blog,a public blog attribute model based on the general attributes of blog pages was developed as a basis on which the public blog was clustered.Then,after the initial clustering,the comments on the clustered public blog were taken in to revise the clustered blog.The clustered results were evaluated with entropy and purity,and two testing schemes were designed according to different ways of taking the comments in.One was making the comments on public blog participate in clustering process directly,the other was making use of the comments after clustering to play the role of revision.Testing results showed that,in most cases,the latter was more effective than the former.
出处
《东北大学学报(自然科学版)》
EI
CAS
CSCD
北大核心
2010年第6期782-785,共4页
Journal of Northeastern University(Natural Science)
基金
国家自然科学基金资助项目(60773218)
国家高技术研究发展计划项目(2009AA01Z122)
辽宁省科学技术基金资助项目(20072031)
关键词
博客
聚类
博客评论
修正
聚类算法
public blog
clustering
comment on blog
revision
clustering algorithm