摘要
近几年来,短文本信息流广泛应用于一些全民媒体,它在公开传递信息同时携带了丰富且具有极大价值的信息资源。该文提出了一种回顾式话题识别模型,改进了权值计算方法,有效提取了具有较强分辨话题能力的关键词,在聚类过程中将BIC值作为话题类别合并依据,提高了聚类的准确率。通过进行时间段分隔和去掉孤立点信息提高了算法的效率。实验结果表明,该方法有效地提高了短文本信息流的话题检测准确率和效率。
In recent years, the short text information flow has occured in some public media. For this kind of data, a retrospective topic identification model is presented with an improved weight estimation. It employes the value of BIC for clustering to improve the clustering accuracy. By dividing the time segments and removing isolated information point, the efficiency of the algorithm is further improved. The experimental results show that this method achieves good accuracy and efficiency in the topic detection of the short text information flow.
出处
《中文信息学报》
CSCD
北大核心
2015年第1期111-117,132,共8页
Journal of Chinese Information Processing
基金
河北省科技支撑计划项目(10213581)
淮安市社会发展项目(HASZ2012046)
淮安市科技支撑计划(工业)项目(HAG2012086)
关键词
短文本
信息流
话题识别
聚类
short text
information flow
topic identification
clustering