摘要
提出了一个通过对同一主题的多文档集合内局部主题的判定和抽取生成多文档文摘的方法.首先在对多文档集合中句子依存分析和语义分析的基础上进行相似度计算,将相似句子经过聚类形成多文档集合内不同的局部主题,然后进行每个局部主题中质心句的抽取和排序,生成多文挡文摘.该方法实现了文摘长度随文档内容自动确定,从而保证了文摘中包含的信息的全面和简洁.最后文中还给出了多文档文摘的评价方法和实验结果,文摘的平均精确率和平均压缩率分别为71.4%和25.2%.
This paper describes a multi-document summarization method based on local topics identification and extraction. The similarity of sentences is measured by analysis of dependency and semantics. Local topics are found by sentence clustering. The centroid sentence is extracted from each local topic and is ordered to generate summarization. The size of summarization is determined according to content of multiple documents, as a result, the summarization becomes general and concise. Finally, the evaluation and experiment are given, the average precision of summarization and the average ratio of compressibility are 71.4% and 25.2%, respectively.
出处
《自动化学报》
EI
CSCD
北大核心
2004年第6期905-910,共6页
Acta Automatica Sinica
基金
国家自然科学基金(60203020)国家"863"高科技项目基金(2001AA114041)资助~~
关键词
多文档文摘
局部主题
聚类
Calculations
Data compression
Data mining
Evaluation
Semantics