摘要
With the number of social media users ramping up,microblogs are generated and shared at record levels.The high momentum and large volumes of short texts bring redundancies and noises,in which the users and analysts often find it problematic to elicit useful information of interest.In this paper,we study a query-focused summarization as a solution to address this issue and propose a novel summarization framework to generate personalized online summaries and historical summaries of arbitrary time durations.Our framework can deal with dynamic,perpetual,and large-scale microblogging streams.Specifically,we propose an online microblogging stream clustering algorithm to cluster microblogs and maintain distilled statistics called Microblog Cluster Vectors(MCV).Then we develop a ranking method to extract the most representative sentences relative to the query from the MCVs and generate a query-focused summary of arbitrary time durations.Our experiments on large-scale real microblogs demonstrate the efficiency and effectiveness of our approach.
基金
This work was supported by Chongqing Research Program of Basic Research and Frontier Technology(cstc2017jcyjAX0071)
Basic and Advanced Research Projects of CSTC(cstc2019jcyjzdxm0102)
Chongqing Science and Technology Innovation Leading Talent Support Program(CSTCCXLJRC201908)
Science and Technology Research Program of Chongqing Municipal Education Commission(KJZD-K201900605).