摘要
针对面向微博的中文新闻摘要的主要挑战,提出了一种将矩阵分解与子模最大化相结合的新闻自动摘要方法。该方法首先利用正交矩阵分解模型得到新闻文本潜语义向量,解决了短文本信息稀疏问题,并使投影方向近似正交以减少冗余;然后从相关性和多样性等方面评估新闻语句集合,该评估函数由多个单调子模函数和一个评估语句不相似度的非子模函数组成;最后设计贪心算法生成最终摘要。在NLPCC2015数据集上的实验结果表明,该方法能有效提高面向微博的新闻自动摘要质量,ROUGE得分超过其他基线系统。
This paper presented a novel method for Weibo-oriented Chinese new summarization which combined matrix factorization and submodular maximization. It used the orthogonal matrix factorization(OrMF) model to solve the information sparsity issue of short texts and the information redundancy problem in the projection procedure, and obtained robust latent vectors for news sentences. Moreover, it evaluated news sentences for its relevance and diversity. The objective function included several submodular functions and a non-submodular function that evaluated sentence dissimilarities. Finally, it designed a greedy algorithm to select summary sentences. Experimental results on NLPCC2015 datasets show that the ROUGE scores of the proposed method outweigh other baseline systems and that the quality of Weibo-oriented news summaries is improved effectively.
出处
《计算机应用研究》
CSCD
北大核心
2017年第10期2892-2896,2928,共6页
Application Research of Computers
基金
国家社科重大招标计划资助项目(11&ZD189)
国家自然科学基金面上资助项目(61373108)
关键词
子模属性
正交矩阵分解
新闻摘要
抽取式摘要
微博
submodularity
orthogonal matrix factorization
news summarization
extractive summarization
Weibo