期刊文献+

扩展向量空间上的短语消息聚类 被引量:1

Instant Messages Clustering on Extended Vector Space Model
下载PDF
导出
摘要 基于互联网或移动网的即时通信成为一种广泛应用的大众通信方式.对即时短语消息内容进行聚类可以分析短语消息的内容特征,从而跟踪或发现当前的热门话题,预防或审计犯罪活动,也可以协助建立其他数据挖掘应用.针对短语消息内容短、关键词出现次数少,甚至主题关键词隐藏在上下文或短语会话里的特点,提出了WR-KMeans聚类方法,自动将主体间的交互短语消息合成为会话,使聚类分析对象的内容更长,上下文信息更丰富;对于不在会话中出现,但与会话中的词具有较强语义关系的词,将其扩充进会话的表示向量,从而避免因关键词稀少造成的相似度偏差.WR-KMeans在这种扩展的会话向量集上进行聚类.通过实验与另外两个常用的聚类算法进行比较,WR-KMeans生成的聚类结果具有更好的质量.
出处 《计算机研究与发展》 EI CSCD 北大核心 2007年第z2期157-163,共7页 Journal of Computer Research and Development
基金 国家"八六三"高技术研究发展计划基金项目(2004AA112020,2005AA112030) 国家"九七三"重点基础研究发展规划基金项目(2005CB321804)
  • 相关文献

参考文献21

  • 1[1]J Resig,A Teredesai.A framework for mining instant messaging services.The 2004 SIAM Lake Buena Vista,Florida,2004
  • 2[2]John Resig,et al.Extracting social networks from instant messaging populations.LinkKDD'04,Seattle,Washington,2004
  • 3[3]J MacQueen.Some methods for classification and analysis of multivariate observations.In:Proc of the 5th Berkeley Symp on Mathematics Statistics and Probability.California:University of California Press,1967.281-294
  • 4[4]Sack,et al.A content-based usenet newsgroup browser.In:Proc of the Int'l Conf on Intelligent User Interfaces.New Orleans:Louisianna,2000.233-240
  • 5[5]Faisal M Khan,et al.Mining chat-room conversations for social and semantic interactions.Lehigh University CSE,Tech Rep:LU-CSE-02-011,2002
  • 6[6]Hearst,et al.TextTiling:A quantitative approach to discourse segmentation.University of California Berkeley,Tech Rep:S2K-93-24,1993
  • 7[7]Scott Deerwester,et al.Indexing by latent semantic analysis.Journal of the American Society for Irfformation Science,1990,41(6):391-407
  • 8[8]C H Q Ding.A probabilistic model for dimensionality reduction in information retrieval and filtering.The 1st SIAM,Raleigh,NC,2000
  • 9[9]S Ikehara,et al.Vector space model based on semantic attributes of words.The Pacific Association for Computational Linguistics,Kitakyushu,Japan,2001
  • 10[10]Yi Guan,et al.Quantifying semantic similarity of Chinese words from hownet.ICMLC02,Beijing,2002

同被引文献14

  • 1张华平.计算所汉语词法分析系统ICTCLAS[EB/OL].[2002-08-16].http://www.nip.org.cn/project/project.php?pwj_id=6.
  • 2刘群 李素建.基于《知网》的词汇语义相似度计算.中文计算语言学,2002,7(2):59-76.
  • 3路荣,项亮,刘明荣,等.基于隐主题分析和文本聚类的微博客新闻话题发现研究[C] // 第六届全国信息检索学术会议论文集. 北京:中国中文信息学会,2010.
  • 4Liu Zitao,Yu Wenchao,Chen Wei,et al.Short text feature selection for microblog mining[C]//The 4th International Conference on Computational Intelligence and Software Engineer,Wuhan,China,2010:1-4.
  • 5Yan X,Zhao H.Chinese microblog topic detection based on the latent semantic analysis and structural property[J].Journal of Networks,2013,8(4):917-923.
  • 6Sun Q,Wang Q,Qiao H.The algorithm of short message hot topic detection based on feature association[J].Information Technology Journal,2009,8:236-240.
  • 7赵爱华.面向网络新闻的话题检测技术研究[D].济南:山东师范大学,2013.
  • 8林雪能,陈光,朱帅,等.基于语义框架的新闻话题检测[EB/OL].(2012-12-27).http://www.paper.edu.cn/releasepaper/content/201212-1055.
  • 9Wartena C,Brussee R.Topic detection by clustering keywords[C]//19th International Workshop on Database and Expert Systems Application,2008:54-58.
  • 10Chen K Y,Luesukprasert L,Chou S.Hot topic extraction based on timeline analysis and multidimensional sentence modeling[J].IEEE Transactions on Knowledge and Data Engineering,2007,19(8):1016-1025.

引证文献1

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部