摘要
通过分析中文报道的特点,提出了一种改进相似度计算的话题检测算法。该算法以Single-Pass聚类策略为基础,结合新闻报道中的地点信息,分别对新闻报道进行文本内容相似度和地点相似度计算,并将两者结合进行话题检测。实验结果表明,算法性能优于传统的文本相似度算法。
Based on an analysis of a large number of Chinese reports, this paper proposes a topic detection algorithm to improve similarity. This algorithm is based on the Single-Pass clustering technique. According to the location information of the news reports, the content-based similarity and location-based similarity are measured and combined to realize topic detection. Experimental results indicate that the algorithm is superior to the traditional text similarity algorithm.
出处
《电子科技》
2012年第1期96-98,共3页
Electronic Science and Technology
关键词
话题检测
地点信息
相似度计算
topic detection
location information
similarity