摘要
舆情监控系统解决的关键问题是如何有效且精确地对文本进行聚类,以便从大量Web网页中发现网络舆情热点话题。single-pass算法是话题发现中最常用的文本聚类算法,但其在文本聚类的精度和时效方面存在不足,因而论文在对大量新闻报道语料进行深入分析的基础上,从三个方面对single-pass进行了改进。通过实验求证,发现改进后的single-pass算法在漏检率、误检率和耗费函数等方面有了明显改善。
The key to solve the problem of public opinion monitoring systems is how to efficiently and accurately cluster the text ,so as to find the hot topic of network public opinion from a large number of Web pages .Single-pass is the text clustering algo-rithm commonly used in the subject found ,but its disadvantage in the accuracy and effectiveness .Based on the deep analysis of a large number of news reports ,the algorithm of single-pass is improved from three aspects .Experiment verification proves that the improved algorithm of single-pass have improved significantly in the miss rate ,error detection rate and time consuming .
出处
《计算机与数字工程》
2014年第7期1233-1237,共5页
Computer & Digital Engineering