摘要
提出一种基于最大树法的生成多文档文摘子主题划分方法。对多文档集合中的句子进行基于语义词典的相似度计算,形成相似度矩阵。提出了将相同或相似的句子通过模糊聚类的方法归并成一类,每一类代表一个子主题,通过抱团结构分析划分出子主题。实验结果表明,生成的多文档文摘覆盖性强,冗余信息少,具有一定实用价值。
A novel approach for sub-topic segmentation based on maximum tree algorithm was proposed.A method of sentence similarity computation based on semantic dependency was studied deeply.The similar sentences in multi-document set were combined into one class,each class was on sub-topic.Based on sentences similarity matrix calculated maximum tree,strategy is employed to divide sub-topic.The experiment results shows that the multi-document summarization made is more coverage,less redundant,which has certain practical value.
出处
《辽宁科技大学学报》
CAS
2009年第6期575-580,共6页
Journal of University of Science and Technology Liaoning
关键词
多文档文摘
子主题划分
最大树算法
multi-document summarization sub-topic segmentation maximum tree algorithm