期刊文献+

基于Single-Pass的网络话题在线聚类方法研究 被引量:6

Study on Web Topic Online Clustering Approach Based on Single -Pass Algorithm
原文传递
导出
摘要 基于Single-Pass算法思想,研究网络话题的在线聚类方法,以期及时捕捉网络信息的动态变化在分析该方法聚类流程的基础上,重点研究网络动态信息流的文本特征抽取和权重计算方法,以及话题类表示和更新等关键问题,设计实验对比分析不同的标题中特征加权系数、特征权重计算和标准化方法以及话题类向量维度对话题聚类质量和时间效率的影响。 In order to get dynamics of Web information timely, an online Web topic clustering approach based on Single - Pass algorithm is researched. The clustering process of this approach is analyzed firstly, and the key problems including extracting and weight calculating of features as well as representation and modification of topic cluster are deliberated. Experiment is designed to compare the effects of different weight factor of features in title, weight calculating and normalizing methods of features and the vector dimension of topic cluster on cluster quality and time efficiency.
出处 《现代图书情报技术》 CSSCI 北大核心 2011年第12期52-57,共6页 New Technology of Library and Information Service
基金 江苏省社会科学基金项目"网络舆论危机的生成与预警机制研究"(项目编号:10TQC009) 江苏省教育厅高校哲学社会科学项目"互联网舆情演化的动态网络模型研究"(项目编号:2011SJB870006) 南京邮电大学青蓝计划"网络舆情热点发现及动态预警研究"(项目编号:NY210055)的研究成果之一
关键词 网络舆情 话题挖掘 在线聚类 Single—Pass Internet public opinion Topic mining Online clustering Single -Pass
  • 相关文献

参考文献6

二级参考文献22

  • 1贾自艳,何清,张海俊,李嘉佑,史忠植.一种基于动态进化模型的事件探测和追踪算法[J].计算机研究与发展,2004,41(7):1273-1280. 被引量:58
  • 2骆卫华,于满泉,许洪波,王斌,程学旗.基于多策略优化的分治多层聚类算法的话题发现研究[J].中文信息学报,2006,20(1):29-36. 被引量:38
  • 3马辉民,李卫华,吴良元.VSM在中文文本聚类中的应用及实证分析[J].武汉理工大学学报(信息与管理工程版),2006,28(4):56-59. 被引量:13
  • 4贾焰,王永恒,杨树强.基于本体论的文本挖掘技术综述[J].计算机应用,2006,26(9):2013-2015. 被引量:17
  • 5中国互联网络信息中心.第22次中国互联网络发展状况统计报告[EB/OL].http://www.cnnic.net.cn/uploadfiles/pdf/2008/7/23/170516.pdf2008-07-23.
  • 6Sedding J, Kazakov D. WordNet-based text document clustering[ C] //Proceedings of the Third Workshop on Robust Methods in Analysis of Natural Language Data (ROMAND). Geneva, 2004:104-113.
  • 7Hotho A, Staab S, Stumme G. WordNet improves text document clustering [ C ]//Proceedings of the Semantic Web Workshop at SIGIR-2003, 26th Annual International ACM SIGIR Conference, 2003.
  • 8Hotho A, Maedche A, Staab S. Ontology-based text document clustering [ C ]//Proceedings of the Conference on Intelligent Information Systems. Zakopane: Springer-Verlag, 2003.
  • 9ICTCLAS简介[EB/OL].[2008-12-01].http://ictclas.org/sub_1_1.html.
  • 10Allan J, CarboneU J, Doddington G, et al. Topic Detection and Tracking Pilot Study: Final Report[C]//Proceedings of the DARPA Broadcast News Transcription and Understanding Workshop. Virginia: Lansdowne, February, 1998:194 - 218.

共引文献93

同被引文献78

  • 1王小华,卢小康.基于N-Gram的文本去重方法研究[J].杭州电子科技大学学报(自然科学版),2010,30(2):61-64. 被引量:5
  • 2YE Hui-min,CHENG Wei,DAI Guan-zhong.Design and Implementation of On-Line Hot Topic Discovery Model[J].Wuhan University Journal of Natural Sciences,2006,11(1):21-26. 被引量:14
  • 3刘毅.略论网络舆情的概念、特点、表达与传播[J].理论界,2007(1):11-12. 被引量:312
  • 4周亚东,孙钦东,管晓宏,李卫,陶敬.流量内容词语相关度的网络热点话题提取[J].西安交通大学学报,2007,41(10):1142-1145. 被引量:27
  • 5Heintze N. Scalable Document Fingerprinting[ C ]. In : Proceedings of the 1996 USENIX Workshop on Electronic Commerce. 1996.
  • 6Broder A Z, Glassman S C, Manasse M S, et al. Syntactic Cluste- ring of the Web[J]. Computer Networks and ISDN Systems, 1997, 29(8 -13) : 1157 -1166.
  • 7张刚,刘挺,郑实福,等.大规模网页快速去重算法[EB/OL].[2013-05-31].http://wenku.baidu.corn/view/3bf04d35eefdc8d376ee32dO.html.
  • 8Chowdhury A, Frieder O, Grossman D, et al. Collection Statistics for Fast Duplicate Document Detection [ J]. ACM Transactions on Information Systems, 2002, 20(2) : 171 - 191.
  • 9Kocz A, Chowdhury A, Alspector J. Improved Robustness of Sig- nature - based Near - replica Detection via Lexicon Randomization [ C]. In : Proceedings of the lOth ACM SIGKDD International Con- ference on Knowledge Discovery and Data Mining. New York, NY, USA: ACM, 2004:605 -610.
  • 10Charikar M S. Similarity Estimation Techniques from Rounding Al- gorithms [ C ]. In: Proceedings of the 34th Annual ACM Symposium on Theory of Comvuting. New York, NY, USA: ACM, 2002:380-388.

引证文献6

二级引证文献40

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部