摘要
本文提出了一种能够同时考虑关键词和发生时间的新型算法。该算法首先对数据预处理后,建立微博事件集合的LDA模型,生成主题词集合作为事件的描述标志,通过DTW算法对事件关键词间的语义、时序相似度进行计算,得到对应的相似度矩阵,最后采用协同训练普聚类方法,迭代生成最终的特征向量并完成事件选取。仿真实验结果表明,本文提出的算法同以往算法相比具有准确率高以及效率高的特点。
This paper proposes a new algorithm can simultaneously consider keywords and time, the algorithm firstly after data preprocessing, weibo event collection of LDA model, generates keywords set as the symbol, the description of the event by DTW algorithm for event keyword semantic, sequence similarity calculation, get the corresponding similarity matrix, finally USES the clustering method, the joint training where the iteration to produce the final feature vector and complete the event selection. The simulation results show that the proposed algorithm has high accuracy and efficiency compared with previous algorithms.
出处
《科技通报》
北大核心
2017年第11期129-132,共4页
Bulletin of Science and Technology
关键词
微博文本
文本数据挖掘
多视角
相似矩阵
micro-blog
text data mining
multiple points of view
similar matrix