基于评论修正的博客聚类算法被引量：2

Public Blog Clustering Algorithm Based on Revision by Comments

下载PDF

导出

摘要博客聚类是处理博客信息的有效方法,提出基于评论修正的博客页面聚类算法.首先分析博客所包含的信息层次结构,然后利用博客页面的通用属性构建博客属性模型,基于博客属性模型对博客页面进行聚类,并且在初次聚类的基础上利用博文的评论对聚类结果进行修正.采用通用的熵和纯净度来衡量聚类结果,根据评论利用方式的不同,设计了两种实验方案:一个实验直接使用评论参与聚类,另一个将评论作为聚类后的修正手段.实验结果对比表明,在大多数情况下,利用评论作为修正手段的聚类效果要优于直接利用评论参与聚类. Public blog clustering is an effective way to process blog information.A public blog clustering algorithm was therefore proposed,based on the revision by comments.Analyzing the information hierarchy of public blog,a public blog attribute model based on the general attributes of blog pages was developed as a basis on which the public blog was clustered.Then,after the initial clustering,the comments on the clustered public blog were taken in to revise the clustered blog.The clustered results were evaluated with entropy and purity,and two testing schemes were designed according to different ways of taking the comments in.One was making the comments on public blog participate in clustering process directly,the other was making use of the comments after clustering to play the role of revision.Testing results showed that,in most cases,the latter was more effective than the former.

作者郭朋伟高克宁张斌

机构地区东北大学信息科学与工程学院

出处《东北大学学报（自然科学版）》 EI CAS CSCD 北大核心 2010年第6期782-785,共4页 Journal of Northeastern University(Natural Science)

基金国家自然科学基金资助项目(60773218) 国家高技术研究发展计划项目(2009AA01Z122) 辽宁省科学技术基金资助项目(20072031)

关键词博客聚类博客评论修正聚类算法 public blog clustering comment on blog revision clustering algorithm

分类号 TP391 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献11

1Zhang W, Yu C T, Meng W Y. Opinion retrieval from blogs [ C]//CIKM. Lisboa, 2007 : 831 - 840.
2Agarwal N, Liu H. Blogosphere: research issues, tools, and applications[J]. SIGKDD Explorations, 2008,10 ( 1 ) : 18 -31.
3Ishida K. Extracting spam blogs with co-citation clusters[C] //WWW2008. Beijing, 2008:1043 - 1044.
4Agarwal N, Oliveras M G, Liu H. Clustering blogs with collective wisdom [ C]//ICWE. New York, 2008:336- 339.
5Brooks C H, Montanez N. Improved annotation of the blogosphere via autotagging and hierarchical clustering [C]// WWW2006. Edinburgh, 20061625-632.
6Li B B, Xu S T, Zhang J. Enhancing clustering blog documents by utilizing author/reader comments[C]//ACM Southeast Regional Conference. New York, 2007:94-99.
7Bansal N, Chiang F, Koudas N. Seeking stable clusters in the blogosphere[ C]//VLDB. Vienna, 2007 : 806 - 817.
8Sun A, Suryanto M A, Liu Y. Blog classification using tags: an empirical study[C]//ICADL. Hanoi, 2007 : 307 - 316.
9韩家炜.数据挖掘概念与技术[M].范明,孟小峰,译.北京:机械工业出版社,2006.
10Han Jia- wei, Kamber M. Data mining concepts and techniques[M]. Translated by Fan Ming, Meng Xiao-feng. Beijing:China Machine Press, 2006:467- 483.

共引文献4

1梁斌梅.自组织特征映射神经网络的改进及应用研究[J].计算机工程与应用,2009,45(31):134-137. 被引量：7
2毕永成.Web日志挖掘中预处理过程的具体研究[J].现代电子技术,2010,33(18):97-100. 被引量：2
3夏惠芬,董卫民.基于关联规则的Web挖掘技术研究[J].现代电子技术,2011,34(16):100-102. 被引量：4
4葛菲,谭宗颖.学科领域主题新兴趋势探测方法研究——基于关键词生命周期和引文分析[J].情报理论与实践,2013,36(9):78-82. 被引量：14

同被引文献27

1肖欣延,张东站,高君杰,薛永生.一种新的Web检索结果聚类方法[J].计算机研究与发展,2007,44(z2):79-83. 被引量：3
2王曰芬,宋爽,苗露.共现分析在知识服务中的应用研究[J].现代图书情报技术,2006(4):29-34. 被引量：59
3张树良,冷伏海.基于文献的知识发现的应用进展研究[J].情报学报,2006,25(6):700-712. 被引量：47
4王曰芬,宋爽,卢宁,朱烨.共现分析在文本知识挖掘中的应用研究[J].中国图书馆学报,2007,33(2):59-64. 被引量：44
5Lin Y,Iin H F, Song J, et al. Social annotation in queryexpansion : a machine learning approach [ C ]//Special InterestGroup on Information Retrieval.Beijing,2011:405 -414.
6Cao G H,Nie J Y,Gao J F,et al. Selecting good expansionterms for pseudo-relevance feedback [ C ]//Special InterestGroup on Information Retrieval. Singapore,2008 :243 -250.
7Zhai C X. Beyond search: statistical topic models for textanalysis [ C ]//Special Interest Group on InformationRetrieval. Beijing,2011:3 -4.
8Lee K S,Croft W B,Allan J. A cluster-based resamplingmethod for pseudo-relevance feedback [ C ] //Special InterestGroup on Information Retrieval. Singapore ,2008 :235 -242.
9Inna G K, Oren K. Cluster-based query expansion [ C ]//Special Interest Group on Information Retrieval. Boston,2009:646 -647.
10Song W, Zhang Y,Liu T,et al. Bridging topic modeling andpersonalized search [ C ] //The 23 rd International Conferenceon Computational Linguistics. Beijing,2010: 1167 -1175.

引证文献2

1张博,张斌,高克宁.一种用于查询扩展词选取的主题模型[J].东北大学学报（自然科学版）,2013,34(3):348-351. 被引量：2
2龚凯乐,成颖,孙建军.基于参与者共现分析的博文聚类研究[J].现代图书情报技术,2016(10):50-58. 被引量：2

二级引证文献4

1杨静,刘宁,张键沛.一种基于约束的半监督聚类查询扩展方法[J].中国科技论文,2013,8(10):994-997.
2兰慧红.跨语言查询扩展技术研究进展[J].电子技术与软件工程,2017(23):142-143.
3钱旦敏,楼筱湾,王华麟,王文敬,马野青.我国信息资源管理学科及其邻近学科视角下的新兴主题识别[J].图书馆论坛,2023,43(9):54-64. 被引量：2
4龚凯乐,谢娟,成颖.中文期刊论文引文国际化多维分析——以图书情报与档案管理学科为例[J].情报科学,2019,37(3):127-135. 被引量：8

1杨霞.基于同义词词林的微博客评论情感分类研究[J].电子科技,2014,27(7):134-136. 被引量：3
2王勇.隐去QQ头像上的博客信息[J].网友世界,2008(22):26-26.
3马如林,蒋华,张庆霞.基于贝叶斯方法和信息指纹的博客评论过滤[J].计算机工程与应用,2008,44(24):159-161. 被引量：2
4王志松,张晶磊.基于页面聚类的个性化推荐算法研究[J].燕山大学学报,2007,31(3):217-220.
5邱爽.关于博客评论情感倾向对个人投资者行为影响分析[J].科技视界,2014(27):165-165. 被引量：1
6郑燕玉,李冬.基于博客的研究性学习初探[J].教育信息技术,2008(7):16-18.
7沙有威.为者常成行者常至——自驾支教老有所为老有所获[J].中国信息技术教育,2013(3):12-16.
8许建.博客管理系统的设计[J].中国科技博览,2015,0(29):389-389.
9王家伟,陈松.基于通用属性表达同类实体间联系的应用[J].计算机与现代化,2008(8):125-126.
10袁毅.微博客信息传播结构、路径及其影响因素分析[J].图书情报工作,2011,55(12):26-30. 被引量：53

东北大学学报（自然科学版）

2010年第6期

浏览历史

内容加载中请稍等...

基于评论修正的博客聚类算法被引量：2

参考文献11

共引文献4

同被引文献27

引证文献2

二级引证文献4

相关作者

相关机构

相关主题

浏览历史

基于评论修正的博客聚类算法 被引量：2

参考文献11

共引文献4

同被引文献27

引证文献2

二级引证文献4

相关作者

相关机构

相关主题

浏览历史

基于评论修正的博客聚类算法被引量：2