期刊文献+

一种基于聚类的微博关键词提取方法的研究与实现 被引量:9

Research and Implementation of Micro-blog Keyword Extraction Method Based on Clustering
下载PDF
导出
摘要 文章提出了一种基于聚类的微博关键词提取方法。实验过程分三个步骤进行。第一步,对微博文本进行预处理和分词处理,再运用TF-IDF算法与Text Rank算法计算词语权重,针对微博短文本的特性在计算词语权重时运用加权计算的方法,在得到词语权重后使用聚类算法提取候选关键词;第二步,根据n-gram语言模型的理论,取n的值为2定义最大左邻概率和最大右邻概率,据此对候选关键词进行扩展;第三步,根据语义扩展模型中邻接变化数和语义单元数的概念,对扩展后的关键词进行筛选,得到最终的提取结果。实验结果表明在处理短文本时Text Ramk算法比TF-IDF算法表现更佳,同时该方法能够有效地提取出微博中的关键词。 This paper presented a Micro-blog keyword extraction based on Clustering. It achieved in three steps. At ifrst, the experiment pre-processed and breaked word on the microblogs, then used TF-IDF and TextRank algorithm to calculate word weight, according to the characteristics of short text microblogging used a combination of the two methods calculate weighting terms and extracted candidate keyword by clustering algorithm. Secondly, taked n is 2 deifnes the maximum probability left neighbor and maximum probability right neighbor based on the theory of n-gram language model, accordingly extended the candidate keywords into key phrases. At last, the result ifltered according to the concept of accessory variety and semantic number of units in the semantics extension model. The experimental results show this method can effectively extracted the microblogs keywords and TextRank performed better than the TF-IDF when processed short text .
出处 《信息网络安全》 2014年第12期27-31,共5页 Netinfo Security
基金 国家科技支撑计划[2012BAH38B00] 国家自然科学基金[61202362 61262057] 中国博士后科学基金[2013M542560]
关键词 微博关键词 聚类算法 TF-IDF TextRank N-GRAM语言模型 TF-IDF TextRank micro-blog keyword clustering algorithm TF-IDF TextRank n-gram language model
  • 相关文献

参考文献1

二级参考文献10

  • 1李素建,王厚峰,俞士汶,辛乘胜.关键词自动标引的最大熵模型应用研究[J].计算机学报,2004,27(9):1192-1197. 被引量:92
  • 2索红光,刘玉树,曹淑英.一种基于词汇链的关键词抽取方法[J].中文信息学报,2006,20(6):25-30. 被引量:88
  • 3Jilin Chen, Benyu Zhang, Dou Shen, Qiang Yang. Zheng Chen. Diverse Topic Phrase Extraction from Text Collection. Data Mining [C]//ICDM apos: 06. Sixth International Conference on Volume, Issue, Digital Object Identifier. 2006.
  • 4Blaz Fortuna, Dunja Mladenic, Marko Grobelnik . Semi-Automatic Construction of Topic Ontology[C]// ESWC 2005.
  • 5Khaled M. Hammouda, Diego N. Matute, and Mohamed S. Kamel. CorePhrase: Keyphrase Extraction for Document Clustering[C]//Machine Learning and Data Mining in Pattern Recognition. 2005: 265-274.
  • 6Neto, J., Santos, A., Kaestner, C., Freitas, A. Document clustering and text summarization [C]// Proc. 4th International Conference Practical Applications of Knowledge Discovery and Data Mining (PADD-2000), London, UK: 2000:41-55.
  • 7Salton, G. (1991): Developments in Automatic Text Retrieval[J]. Science, Vol 253, 974-979.
  • 8K.B. Khoo and M. Ishizuka. Emerging Topic Track ing System [C]//Proc. of Web Intelligent (WI 2001), LNAI 2198 (Springer), Maebashi, Japan: 2001: 125-130.
  • 9Khoo Khyou Bun, Mitsuru Ishizuka, Topic Extraction from News Archive Using TF× PDF Algorithm[C]// The Third International Conference on Web Information Systems Engineering (WISE'02), 2002.
  • 10董振东 董强.[EB/OL].知网[EB/OL].http://www.keenage.com,1999.

共引文献14

同被引文献113

  • 1刘远超,王晓龙,徐志明,关毅.文档聚类综述[J].中文信息学报,2006,20(3):55-62. 被引量:65
  • 2雷震,吴玲达,雷蕾,黄炎焱.初始化类中心的增量K均值法及其在新闻事件探测中的应用[J].情报学报,2006,25(3):289-295. 被引量:25
  • 3刘兵.Web数据挖掘[M].北京:清华大学出版社,2009.
  • 4施少怀. 一种基于用户倾向的微博好友推荐算法[D]. 哈尔滨: 哈尔滨工业大学, 2013.
  • 5Yang Y, CarbonelI J, Brown 1L, et al. Learning approaches fi.,r detecting and tracking news events[J]. Intelligent Systems & Their Applications IEEE, 1999, 14(4):32 - 43.
  • 6Charikar M S. Similarity estimation techniques from rounding algorithms[C]//Proceedings of the thirty-fourth annual ACM symposiunl oll Theory of computing. ACM, 2002: 380-388.
  • 7Chen C C, Chen Y T, Sun Y, et al. Lift" cycle modeling of news events using aging theory[C]//Machine Learning: ECME 2013. Springer Berlin Heidelberg, 21103: 47-59.
  • 8Allan J, Carbonell J G, Doddington G, et al. Topic Detection and Tracking Pilot Study Final Report[C]//proceedings of the darpa broadcast news transcription and understanding workshop, 1998:194-218.
  • 9Blei D M, Ng A Y, Jordan M I. Latent dirichlet allocation[J], the Journal of machine Learning research, 2003, ( 3 ) : 993-1022.
  • 10Chen C C, Chen M C, Chen M S. LIPED: HMM-based life profiles for adaptive event detection[C]//Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining. ACM, 2005: 556-561.

引证文献9

二级引证文献66

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部