摘要
自动摘要技术能够快速掌握原文关键内容,在数据安全防护领域具有广阔的应用前景。在TextRank摘要算法的基础上,提出一种融合了BERT模型、典型摘要特征修正和冗余消除的中文摘要算法SummRank。SummRank算法通过使用自然语言预训练模型BERT进行句向量编码,考虑了句子位置、线索词等典型摘要句特征带来的累加权重修正,并使用最大边缘相关MMR算法去除摘要中的冗余。实验证明,SummRank摘要算法提取的摘要具有更高的相关性和多样性,能够用于多种数据安全防护应用。
Automatic summarization technology can quickly grasp the key content of the original text and has broad application prospects in the field of data security protection. Based on the TextRank digest algorithm, a Chinese abstract algorithm SummRank is proposed, which integrates BERT model, typical digest feature modification and redundancy elimination. The SummRank algorithm performs sentence vector coding by using the natural language pre-training model BERT, takes into account the cumulative weight re-correction caused by the position of sentences, clue word and other typical summary sentence features, and uses maximum edge correlation MMR algorithm to remove the redundancy in summaries. Experiments indicate that SummRank abstract algorithm can extract abstracts with higher relevance and diversity, and can be used in a variety of data security applications.
作者
石元兵
周俊
魏忠
SHI Yuan-bing;ZHOU Jun;WEI Zhong(Westone Information Industry, Ltd., Chengdu Sichuan 610041, China)
出处
《通信技术》
2019年第9期2233-2239,共7页
Communications Technology
基金
“核高基”国家科技重大专项(No.2017ZX01030-201)~~