期刊文献+

基于语义相似度的话题关联检测方法 被引量:6

Topic Link Detection Method Based on Semantic Similarity
下载PDF
导出
摘要 为有效识别任意两篇报道的相似性,提出了一种基于语义相似度的话题关联检测算法.该算法首先通过计算特征词之间的相对熵作为两篇报道中特征词之间的语义相似度;其次,通过计算平均语义相似度获得特征词和报道之间的关联度;最后,结合特征词在语料库中的TF-IF(term frequency-inverse document frequency)权重计算两篇报道之间的关联度,实现报道之间的关联度检测.本文提出的方法与现有的向量空间模型方法和仅依赖于平均点互信息的方法进行了比较,并通过TDT4中文语料进行测评,结果表明,基于语义相似度的关联检测方法能够更好地利用文本的语境信息,提高了现有检测系统的性能,其最小DET(detection error tradeoff)代价降低了3%. To effectively judge the similarity between the topics of any two of stories, a topic link detection method was proposed on the basis of semantic similarity. First, the relative entropy between the feature words in two stories was calculated to work as the semantic similarity. Furthermore, the relevance between the feature words and the other story was obtained by calculating the average semantic similarity. At last, the relevance degree between two stories was calculated by considering TF-IF( term frequency-nverse document frequency)weights of the feature words in the corpus and the semantic similarity simultaneously, completing the link detection of the story pairs. The proposed algorithm was compared with the VSM (vector space model) method and average point-wise mutual information. The experimental results for Chinese Corpus of TDT4 show that minimum DET( detection error tradeoff)cost of the proposed algorithm is reduced by about 3% , which demonstrates that the proposed algorithm can impose the context information effectively and improve the performance of the topic link detection system simultaneously.
出处 《西南交通大学学报》 EI CSCD 北大核心 2015年第3期517-522,共6页 Journal of Southwest Jiaotong University
基金 国家语委"十二五"科研规划资助项目(YB125-49) 教育部科学技术研究重点项目(212167) 中央高校基本科研业务费专项资金资助项目(SWJTU12CX096) 国家级大学生创新创业训练计划资助项目(201210694017)
关键词 关联检测 语义相似度 相对熵 关联度 link detection semantic similarity relative entropy relevance degree
  • 相关文献

参考文献20

  • 1洪宇,张宇,刘挺,李生.话题检测与跟踪的评测及研究综述[J].中文信息学报,2007,21(6):71-87. 被引量:153
  • 2ALLAN J, LAVRENKO V, MALIN D, et al. Detections, bounds and timelines: UMASS and TDT-3 [ C ] //Proceedings of Topic Detection and Tracking (TDT-3). Vienna: [s. n. ], 2000: 167-174.
  • 3KUMARAN G, ALLAN J. Text classification and named entities for new event detection[ C ]//Proc. of the SIGIR 2004. New York: Association for Computing Machinery Press, 2004: 297-304.
  • 4贾真,何大可,尹红风,李天瑞.基于无监督学习的部分-整体关系获取[J].西南交通大学学报,2014,49(4):590-596. 被引量:9
  • 5庞海杰.基于动态共现的中文话题关联检测[J].计算机应用与软件,2012,29(3):115-117. 被引量:1
  • 6杨玉珍,刘培玉,费绍栋,张成功.融合扩展信息瓶颈理论的话题关联检测方法研究[J].自动化学报,2014,40(3):471-479. 被引量:4
  • 7CHEN Y J, CHEN H H, NLP I R. Approaches to monolingual and multilingual link detection[C]// Proceedings of the 19th International Conference on Computational Linguistics-Volume 1. Taipei: Association for Computational Linguistics, 2002 : 1-7.
  • 8SHAH C, EGUCHI K. Improving document representation for story link detection by modeling term topicality[J]. Information and Media Technologies, 2009, 4(2) : 433-441.
  • 9DAGAN I, MARCUS S, MARKOVITCH S. Contextual word similarity and estimation from sparse data[C]// Proceedings of the 31st Annual Meeting on Association for Computational Linguistics. Morristown : Association for Computational Linguistics, 1993 : 164-171.
  • 10袁里驰.一种基于互信息的词聚类算法[J].系统工程,2008,26(5):120-122. 被引量:4

二级参考文献115

共引文献185

同被引文献49

引证文献6

二级引证文献11

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部