摘要
目前的动态文摘方法几乎都基于文档批处理机制,无法适应实际应用中表现为不稳定数据流的文档数据,因此无法满足实时更新摘要的需求。针对上述问题,提出了一种基于K近邻句子图模型的动态文本摘要方法。根据K近邻规则构建一个双层句子图模型,用基于密度划分的增量图聚类方法对句子进行子主题划分,最后结合时间因素提高句子新颖度来抽取动态文摘。该方法能基于文档数据流增量式地抽取动态文摘,实现文摘内容的实时更新。在TAC2008和TAC2009的update summarization数据集上的实验结果显示了该方法在动态文摘抽取上的有效性。
Most of the current update summarization methods are based on the document batching mechanism. However, the document datas in practice are usually unstable streams. So these methods can' t always meet the need of summarizing docu- ments online. To solve the problem, this paper presented a new update summarization method based on the K-nearest neighbours idea. Firstly, this method adopted the K-nearest neighbour rule to build and update a two-level sentence graph model incrementally. Then it used the incremental graph clustering to divide the text subtopics. Finally it enhanced the sentence' s nov- elty by combining the time factor to extract summarizations. The proposed method can handle document data streams incrementally. It extracts and updates the summarizations in real-time. Experiments on TAC2008 and TAC2009 data sets indicate the effectiveness of this method.
出处
《计算机应用研究》
CSCD
北大核心
2016年第7期2034-2038,共5页
Application Research of Computers
基金
四川省教育厅资助项目(14ZB0113)
西南科技大学博士基金资助项目(12zx7116)
关键词
动态文摘
K近邻
句子图模型
增量图聚类
update summarization
K-nearest neighbors
sentence graph model
incremental graph clustering