摘要
针对微博话题检测中需要解决的高维数据、噪声信息以及话题的快速演化等主要问题,提出一个有效的微博在线话题检测模型——可区分语言模型(discriminative language model,DLM)。该模型首先选择微博数据的可区分特征子空间,接着利用一元语言模型实现微博话题的在线检测。实验表明,在MACRO_F1和AVG_CDET等指标上,DLM明显优于现有模型,DLM能准确及时发现微博话题。
To fulfill this task that should tackle several primary challenges in microblogs,such as high dimensional data,noise information,and rapid topic evolution. This paper proposed a novel online topic detection model for tweets,called DLM. DLM first selected a discriminative feature subset and then detected interesting topics with a unigram language model. Experimental results show that DLM clearly outperforms the state-of-the-art models in terms of both MACRO_F1and AVG_CDET.
出处
《计算机应用研究》
CSCD
北大核心
2014年第12期3539-3542,共4页
Application Research of Computers
基金
湖南省工业支撑计划重点项目(2012GK2006)
吉首大学校级科研基金资助项目(Jdzd12011)
关键词
话题检测
特征选择
微博
语言模型
可区分语言模型
topic detection
feature selection
microblog
language model
discriminative language model(DLM)