期刊文献+

基于句子成分的微博热点主题挖掘模型研究 被引量:3

Research on Micro-Blog Hot Topics Mining Model on Sentence Constituents
原文传递
导出
摘要 由于传统聚类分析中文本相似度计算方法不适用于短文本,本文选用基于句子成分的相似度计算方法来计算微博文本之间的相似度。首先对文本进行句子划分,再通过句法分析获取微博的句子成分,选择构成句子成分的词语为特征词。利用知网计算两个微博文本之间相同成分词语的语义相似度,将语义相似度值按句子成分种类加权相加得到微博文本之间的相似度值。据此,构建文本相似矩阵,进行聚类分析,找到微博热点主题。最后,用实验证明本文方法的可行性。 Because the traditional clustering analysis is not applicable to short text, this article selectsthe sentence similarity computing method based on component to calculate similarity between short texts.We obtain sentence constituents by parsing, and choose the words constitute parts of the sentence as keywords. Then we calculate the semantic similarity between key words based on the Hownet. The similaritybetween the texts can calculate by weighted summing the semantic similarity between key words. Accord-ing to this, we can construct the text similarity matrix and do clustering analysis on it. At last, we can minethe hot topics of micro-blogs. Finally, the experiment proved the feasibility of the proposed method.
作者 肖璐 唐晓波
出处 《情报科学》 CSSCI 北大核心 2015年第11期44-47,56,共5页 Information Science
基金 国家自然科学基金项目(71273194)
关键词 句法分析 知网 热点主题 句子成分 parsing hownet hot topics sentence constituents
  • 相关文献

参考文献16

二级参考文献145

共引文献599

同被引文献34

  • 1喻国明.微内容的聚合与开发——未来媒体内容生产的技术关键[J].青年记者,2006(21):40-41. 被引量:39
  • 2裘江南,姚永祥.基于XTM的政务门户知识关联导航系统模型研究[J].情报学报,2007,26(2):260-265. 被引量:12
  • 3BHATIA S,MAJUMDAR D,MITRA P.Query suggestions in the absence of query logs[C]//International ACM SIGIR Conference on Research&Development in Information Retrieval,July 24-28,2011,Beijing,China.New York:ACM Press,2011:795-804.
  • 4HE J,HOLLINK V,DE VRIES A.Combining implicit and explicit topic representations for result diversification[C]//The35th international ACM SIGIR conference on Research and development in information retrieval,August 12-16,2012,Poreland,OR,USA.New York:ACM Press,2012:851-860.
  • 5ZHU X,GUO J,CHENG X,et al.A unified framework for recommending diverse and relevant queries[C]//World Wide Web Conference Series,March 28-April 1,2011,Hyderabad,India.New York:ACM Press,2011:37-46.
  • 6KIM S J,SHIN K Y,LEE J H.Hierarchical subtopic mining for topic annotation[C]//The 6th international workshop on exploiting semantic annotations in information retrieval,October 28,2013,San Francisco,CA,USA.New York:ACM Press,2013:49–52.
  • 7DANG V,CROFT B W.Term level search result diversification[C]//International ACM SIGIR Conference on Research&Development in Information Retrieval,July 28-August 1,2013,Dublin,Ireland.New York:ACM Press,2013:603-612.
  • 8徐琳宏,林鸿飞,潘宇,任惠,陈建美.情感词汇本体的构造[J].情报学报,2008,27(2):180-185. 被引量:384
  • 9曾依灵,许洪波,白硕.网络文本主题词的提取与组织研究[J].中文信息学报,2008,22(3):64-70. 被引量:14
  • 10林鑫,胡昌平.交互式信息服务中的微内容重组分析[J].情报杂志,2008,27(9):69-71. 被引量:5

引证文献3

二级引证文献29

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部