摘要
在基于基本要素(BE)向量空间的英文多文档自动文摘中,句子不再用术语向量或词向量来表达,而是用基本要素向量来表示。在用k-均值聚类算法时,采用一种自动探测k值的技术。实验表明,基于基本要素的多文档自动文摘MSBEC比基于词更优越。
This paper proposes a novel multi-document sulmmarization strategy based on basic element(BE) vector clustering. In this strategy, sentences are represented by BE vectors instead of word or term vectors before clustering. The BE-vector clustering is realized by adopting the k-means clustering method, and a novel clustering analysis method is employed to automatically detect the number of clusters, k. The experimental results indicate a superiority of the proposed strategy over the traditional summarization strategy based on word vector clustering.
出处
《计算机工程》
CAS
CSCD
北大核心
2007年第14期166-167,170,共3页
Computer Engineering
基金
国家自然科学基金资助重大项目(90104005)
关键词
多文档自动文摘
基本要素
K-均值聚类
multi-document summarization
basic element
k-means clustering