摘要
自动文本摘要中一个关键的步骤是确定文章的主旨并将反映文章主旨的句子提取出来。在讨论分析kmeans,k-medoids等聚类算法的基础上,根据对文本摘要的实际要求以及文档自身的特点,提出一种基于聚类算法的主旨句提取方法。实验结果表明,在提高聚类准确性的基础上,新方法较其他聚类算法能够更加有效地避免遗漏主题的问题,能较全方位地反映全文的主旨,提取出的摘要既覆盖全面又突出重点。
One of the most important steps in automatic summarization is to discover and extract the topic sentences. After comparing some clustering algorithms (such as k-means, k-medoids), according to the practical demands of summarization and the specialties of texts, improving methods are presented to avoid missing topics by improving the precision of clustering, for many articles do not have one topic. The summary by these methods can fully cover the article's topics.
出处
《情报学报》
CSSCI
北大核心
2008年第1期49-55,共7页
Journal of the China Society for Scientific and Technical Information
关键词
自动文本摘要
聚类算法
主旨句
文本单元
聚类中心
automatic summarization, clustering algorithm, thematic sentence, summary cell, clustering center