摘要
文中总结了自动文摘的主要研究方法和策略并把方法分成了三大类:自动摘录、基于信息抽取的自动文摘和基于理解的自动文摘。自动摘录方法是从文章中抽取重要句子来形成文摘;基于信息抽取的文摘方法是用从文章中抽取的信息填充已经编好的框架,然后用模板将内容输出;基于理解的文摘方法是利用自然语言处理技术生成文摘。文中重点总结了单主题文章和多主题文章的自动摘录方法,在多种算法进行优缺点比较后提出了一种新的多主题划分方法。
It summarizes the main automatic abstracting research methods and strategies and divides the methods into three major categories: automatically extracted summarization,automatic summarization based on information extraction and summarization based on understanding.Automatically extracted method uses that extract important sentences from the article to form a digest;Abstract based on information extraction method uses that extract information from the article to fill framework which has been prepared,and then use the template to output the content;Abstract based on understanding is to use natural language processing technology to generate abstracts.focuses on automatically extracted summarization from single theme articles and multi-topic articles.After comparing advantages and disadvantages of variety of algorithms,a new multi-topic classification method is proposed.
出处
《计算机技术与发展》
2011年第8期188-191,共4页
Computer Technology and Development
基金
国家社科基金项目(05BYY022)
关键词
句子权值
相似度
关联网络
词频
聚类
主题划分
sentence weights
similarity
association networks
word frequency
cluster
topic segmentation