期刊文献+

基于增量图聚类的动态多文档摘要算法 被引量:2

Update summarization using incremental graph clustering
下载PDF
导出
摘要 目前的动态文摘方法几乎都基于文档批处理机制,无法适应实际应用中表现为不稳定数据流的文档数据,因此无法满足实时更新摘要的需求。针对上述问题,提出了一种基于K近邻句子图模型的动态文本摘要方法。根据K近邻规则构建一个双层句子图模型,用基于密度划分的增量图聚类方法对句子进行子主题划分,最后结合时间因素提高句子新颖度来抽取动态文摘。该方法能基于文档数据流增量式地抽取动态文摘,实现文摘内容的实时更新。在TAC2008和TAC2009的update summarization数据集上的实验结果显示了该方法在动态文摘抽取上的有效性。 Most of the current update summarization methods are based on the document batching mechanism. However, the document datas in practice are usually unstable streams. So these methods can' t always meet the need of summarizing docu- ments online. To solve the problem, this paper presented a new update summarization method based on the K-nearest neighbours idea. Firstly, this method adopted the K-nearest neighbour rule to build and update a two-level sentence graph model incrementally. Then it used the incremental graph clustering to divide the text subtopics. Finally it enhanced the sentence' s nov- elty by combining the time factor to extract summarizations. The proposed method can handle document data streams incrementally. It extracts and updates the summarizations in real-time. Experiments on TAC2008 and TAC2009 data sets indicate the effectiveness of this method.
出处 《计算机应用研究》 CSCD 北大核心 2016年第7期2034-2038,共5页 Application Research of Computers
基金 四川省教育厅资助项目(14ZB0113) 西南科技大学博士基金资助项目(12zx7116)
关键词 动态文摘 K近邻 句子图模型 增量图聚类 update summarization K-nearest neighbors sentence graph model incremental graph clustering
  • 相关文献

参考文献20

  • 1秦兵,刘挺,李生.多文档自动文摘综述[J].中文信息学报,2005,19(6):13-20. 被引量:51
  • 2http://www.nist.gov/tac/[EB/OL].
  • 3Du Pan,Guo Jiafeng,Zhang Jin,et al.Manifold ranking with sink points for update summarization[C]//Proc of the 19th ACM International Conference on Information and Knowledge Management.New York:ACM Press,2010:1757-1760.
  • 4Clarke C L A,Kolla M,Cormack G V,et al.Novelty and diversity in information retrieval evaluation[C]//Proc of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.New York:ACM Press,2008:659-666.
  • 5刘美玲,郑德权,赵铁军,于洋.动态多文档文摘模型[J].软件学报,2012,23(2):289-298. 被引量:9
  • 6张瑾,许洪波,程学旗.面向网络演化信息的动态文摘方法研究[J].计算机学报,2008,31(4):696-701. 被引量:8
  • 7Boudin F,El-Bèze M.A scalable MMR approach to sentence scoring for multi-document update summarization[C]//Proc of the 22nd International Conference on Computational Linguistics.2008:23-26.
  • 8Wan Xiaojun.Update summarization based on co-ranking with constraints[C]//Proc of COLING.2012:1291-1300.
  • 9Dang H T,Owczarzak K.Overview of the TAC 2008 update summarization task[C]//Proc of the 1st Text Analysis Conference.2008:1-16.
  • 10Wan Xiaojun.TimedTextRank:adding the temporal dimension to multi-document summarization[C]//Proc of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.New York:ACM Press,2007:867-868.

二级参考文献49

  • 1http://projects.ldc.upenn.edu/ace/intro.html.
  • 2Mani I. Automatic Summarization. John Benjarnins Publishing Company, 2001.
  • 3Zhang S, Zhao TJ, Yu H, Zhao H. The research on the influence of the types of document sets on multi-document summarization. Journal of Computational Information Systems, 2007,3(3):1201-1206.
  • 4Dang HT, Owczarzak K. Overview of the TAC 2008 Update Summarization Task. In: Proc. of the Text Analysis Conf. 2008.
  • 5Allan J, Jin H, Rajman M, Wayne C, Gildea D, Lavrenko V, Hoberman R, Caputo D. Topic-Based novelty detection. Technical Report, ws99, Baltimore: Center for Language and Speech Processing, Johns Hopkins University, 1999.
  • 6Allan J, Papka R, Lavrenko V. On-Line new event detection and tracking. In: Proc. of the 21st Annual Int'l ACM SIGIR Conf. on Research and Development in Information Retrieval. Melbourne, 1998.37-45. [doi: 10.1145/290941.290954].
  • 7Mani I. Recent developments in temporal information extraction. In: Nicolov N, Mitkov R, eds. Proc. of the RANLP. 2004.
  • 8Makkonen J. Investigations on event evolution in TDT. In: Proc. of the Student Workshop of Human Language Technology Conf. of the North American Chapter of the Association for Computational Linguistics. Edmonton, 2003. 43-48. Idol: 10.3115/1073416. 1073424].
  • 9Mani I, Wilson G. Robust temporal processing of news. In: Proc. of the 38th Annual Meeting on Association for Computational Linguistics. Hong Kong, 2000. 69-76. [doi: 10.3115/1075218:1075228].
  • 10Lin CY, Hovy E. Automatic evaluation of summaries using N-gram cooccurrence statistics. In: Proc. of the 2003 Conf. of the North American Chapter of the Association for Computational Linguistics on Human Language Technology (NAACL 2003). Morristown: Association for Computational Linguistics, 2003.71-78. [doi: 10.3115/1073445.1073465].

共引文献61

同被引文献16

引证文献2

二级引证文献4

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部