期刊文献+

基于图聚类的汉越双语新闻话题发现 被引量:1

Chinese-Vietnamese Bilingual News Topic Detection Methods Based on Graph Clustering
下载PDF
导出
摘要 跨语言新闻话题发现是将互联网上报道相同事件的不同语言新闻进行自动归类,由于不同语言文本很难表示在同一特征空间下,对其共同话题的挖掘就比较困难。然而类似的新闻事件在不同语言文本表达上具有相同的新闻要素,这些要素之间关联能够体现出新闻事件的关联性,因此,针对汉越新闻话题发现问题,提出基于文档图聚类的汉越双语新闻话题发现方法。首先提取汉越新闻文本新闻要素,借助文本中要素相似度计算汉越文本相关度,构建汉越双语文本图模型,获得新闻文本相似度矩阵;然后,借助图模型中文本间的传播特点,采用随机游走算法对相似度矩阵进行调整,最后利用信息传递算法进行聚类。实验结果表明提出的方法取得了很好的效果。 The purpose of cross-language topic discovery is to classify news texts written in different languages by their topics automatically. However,due to the difference in different languages,it's hard to describe these texts on the same feature space,so mining the same topic is not an easy work. When a particular news event is reported,the news elements are the same no matter which language describe it. So news elements can reflect the relevance among different news texts. Therefore,the paper proposed Chinese-Vietnamese bilingual news topic detection methods based on graph clustering. Firstly,Chinese-Vietnamese bilingual news elements are extracted and the similarity of different news texts is calculated by using the news elements' similarity to set up a ChineseVietnamese bilingual news graph model. Secondly,through the propagation characteristics of the Chinese-Vietnamese bilingual news graph model,the similarity matrix is adjusted by using the random walk algorithm. Finally,affinity propagation algorithm is used to cluster topic. The experimental result shows that the proposed method is effective.
作者 王禹森 余正涛 高盛祥 周超 洪旭东 Wang Yusen, Yu Zhengtao, Gao Shengxiang, Zhou Chao, Hong Xudong(School of Information Engineering and Automation, Kunming University of Science and Technology, Kunming, G50500, Chin)
出处 《数据采集与处理》 CSCD 北大核心 2018年第3期530-537,共8页 Journal of Data Acquisition and Processing
基金 国家自然科学基金(61472168 61175068 61672271)资助项目 云南省自然科学基金重点(2013FA130)资助项目 云南省科技创新人才基金(2014HE001)资助项目
关键词 汉越双语 事件要素 话题发现 图聚类 Chinese Vietnamese events element topic detection graph clustering
  • 相关文献

参考文献3

二级参考文献34

  • 1贾自艳,何清,张海俊,李嘉佑,史忠植.一种基于动态进化模型的事件探测和追踪算法[J].计算机研究与发展,2004,41(7):1273-1280. 被引量:58
  • 2耿焕同,蔡庆生,赵鹏,于琨.一种基于词共现图的文档自动摘要研究[J].情报学报,2005,24(6):651-656. 被引量:15
  • 3周嫔,马少平,苏中.多分类器合成方法综述[C].见:中文信息处理国际会议论文集,1998:85~92
  • 4Yiming Yang,Tom Ault,Thomas Pierce,et al.Improving text categorization methods for event tracking[C].The 23rd Annual Int'l ACM SIGIR Conf on Research and Development in Information Retrieval,Athens,Greece,2000.
  • 5James Allan,Jaime Carbonell.Topic detection and tracking pilot study:Final report[C].The DARPA Broadcast News Transcriptions and Understanding Workshop,San Francisco,1998.
  • 6Nianli Ma,Yiming Yang,Monica Rogati.Applying CLIR techniques to event tracking[C].Asia Information Retrieval Symp,Beijing,2004.
  • 7P van Mulbregt,J P Yamron,I Carp,et al.Text segmentation and topic tracking on broadcast news via a hidden Markov model approach[C].ICSLP-98,Sydney,1998.
  • 8Yiming Yang,Jan O Pedersen.A comparative study on feature selection in text categorization[C].The Int'l Conf on Machine Learning,Nashville,USA,1997.
  • 9Juha Makkonen,Helena Ahonen-Myka,Marko Salmenkivi.Applying semantic classes in event detection and tracking[C].Int'l Conf on Natural Language Processing,Mumbai,India,2002.
  • 10James Allan,Victor Lavrenko,Ron Papka.Event tracking[R].University of Massachusetts,Computer Science Department,Tech Rep:IR-128,1998.

共引文献37

同被引文献4

引证文献1

二级引证文献6

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部