摘要
在LDA主题模型的基础上,提出自适应主题融合的多文档自动摘要算法。考虑到标题信息对摘要形成有很强的指示作用,为文档的标题和正文内容分别建立主题模型,并对2个模型进行融合。融合过程中,根据2种形态的信息熵进行自适应不对称学习,从而对不同形态的主题分布进行加权处理。融合后的模型适当地关联了标题和正文的信息,因此能够有助于摘要质量的提高。实验结果表明:自适应主题融合的多文档自动摘要算法在DUC2002标准数据集上取得了较好的效果。
Based on the LDA topic model,a novel multi-document summarization algorithm was proposed based on the adaptive fusion topic model.Considering the strong indication effect of the title cast on forming the summarization,corresponding topic model for the title and content of each document was established.In the fusing stage,the algorithm can learn the weight in an adaptive asymmetric learning way based on two kinds of information entropies.In this way,the final model incorporates the title information and the content information appropriately,which helps to improve the performance of summarization process.The experimental results show that the proposed algorithm achieves better performance on DUC2002 datasets.
出处
《中南大学学报(自然科学版)》
EI
CAS
CSCD
北大核心
2013年第S2期205-209,共5页
Journal of Central South University:Science and Technology
基金
国家自然科学基金资助项目(61073133
61175053
61272369)
关键词
多文档摘要
主题模型
自适应学习
信息熵
multi-document summarization
topic model
adaptive learning
information entropies