摘要
[目的/意义]为帮助读者从热点事件产生的海量微博报道中快速了解事件的来龙去脉,提高微博事件摘要的准确性和可读性,提出一种基于事件要素的多模型微博热点事件时间轴摘要提取方法。[方法/过程]针对微博文本特征,结合主题模型(LDA)与互信息最大熵模型(MaRxEnt-MI)的特点提取事件摘要关键词,以微博传播价值和主体相关性为标志筛选微博,以时间-摘要关键词——摘要微博的形式生成时间轴摘要。[结果/结论]利用人工标注的测试集,与传统是TextRank方法进行对比,F值提高8%-13%,内部测试表明摘要可读性提高明显。实验文本和测试集的数量及事件丰富度需要进一步扩展,应考虑更多的加权策略模型以提高摘要的准确性。实验结果及测试反馈表明,本文的方法能很好满足用户对热点事件摘要信息需求,提高微博摘要提取的准确率。
[ Purpose/significance] In order to help the readers understand the contexts of the news event on micro-blog platform and improve readability and accuracy of micro-blog event summary, we propose a method for extracting the event summary organized by time axis based on event elements. [ Method/process ] Based on the characteristics of micro-blog text, we combine both advantages and disadvantages of the LDA and mutual information maximum entropy model (MaxEnt-MI) and extract event summary keywords, screening micro-blog with micro-blog communication value and theme relevance and generating event summary in the form of time-keywords-mircro-blog. [ Result/conclusion] Comparing with the traditional TextRank method in the artificially labeled test set, we find the F value increased by 8% to 13%, and the internal tests show that the roadability of the abstracts is significantly improved. The number of experimental texts and test sets and the richness of the event need to be further expanded, and more weighting strategies should be considered in order to improve the accuracy of the abstracts. The experimental results and the test results show that the proposed method is feasible and effective, which can meet the needs of the users for the hot event summary information, and improve the accuracy of the micro-blog abstract extraction.
出处
《图书情报工作》
CSSCI
北大核心
2018年第1期96-105,共10页
Library and Information Service
基金
国家社会科学基金重大项目“面向学科领域的网络信息资源深度聚合与服务研究”(项目编号:12&ZD221)研究成果之一
关键词
文本挖掘
事件摘要
潜在狄利克·雷分布
互信息最大熵模型
text mining event summarization latent dirichlet allocation mutual information maximum entropy model