Text summarization aims to generate a concise version of the original text.The longer the summary text is,themore detailed it will be fromthe original text,and this depends on the intended use.Therefore,the problem of...Text summarization aims to generate a concise version of the original text.The longer the summary text is,themore detailed it will be fromthe original text,and this depends on the intended use.Therefore,the problem of generating summary texts with desired lengths is a vital task to put the research into practice.To solve this problem,in this paper,we propose a new method to integrate the desired length of the summarized text into the encoder-decoder model for the abstractive text summarization problem.This length parameter is integrated into the encoding phase at each self-attention step and the decoding process by preserving the remaining length for calculating headattention in the generation process and using it as length embeddings added to theword embeddings.We conducted experiments for the proposed model on the two data sets,Cable News Network(CNN)Daily and NEWSROOM,with different desired output lengths.The obtained results show the proposed model’s effectiveness compared with related studies.展开更多
研究探讨了使用预训练的Pegasus模型进行长文本摘要时,不同文本分割方法对摘要质量的影响。收集来自知网的200篇关于STM32单片机的学术论文作为实验文本,比较了滑动窗口、句子分割、段落分割及滑动窗口加句子分割四种分割法的长文本摘...研究探讨了使用预训练的Pegasus模型进行长文本摘要时,不同文本分割方法对摘要质量的影响。收集来自知网的200篇关于STM32单片机的学术论文作为实验文本,比较了滑动窗口、句子分割、段落分割及滑动窗口加句子分割四种分割法的长文本摘要生成效果。实验使用ROUGE(Recall-Oriented Understudy for Gisting Evaluation)指标对生成的摘要进行评估,并对实验结果进行了详细分析。在生成摘要的质量方面,段落分割法表现出色,其ROUGE-1、ROUGE-2和ROUGE-L评分分别达到了30.85、7.60和20.15,轻微超过了句子分割法的评分,且显著优于句子分割加滑动窗口法。该研究旨在为研究者和开发者提供关于长文本摘要的实践经验和见解。展开更多
Although contrastive move analysis of article abstracts has been a highlight,few studies focus on abstracts of natural sci⁃ence articles.To compensate for this gap,this study,based on IMRD model,focuses on aquatic bio...Although contrastive move analysis of article abstracts has been a highlight,few studies focus on abstracts of natural sci⁃ence articles.To compensate for this gap,this study,based on IMRD model,focuses on aquatic biology abstracts and contrasts those by native English speakers and those by Chinese authors.Combining quantitative and qualitative studies,it reveals their dif⁃ferences and similarities in terms of the frequency of different moves,sentence length and move length significance.Such similari⁃ties and differences can be explained by the face culture of China,the different language proficiency and the common convention of academic abstract.展开更多
生成技术旨在解决海量中文文本所带来的信息过载和冗余问题,以提高信息传播效率和方便读者获取信息。在序列到序列深度模型基础上,提出了一种引入对比学习的中文摘要生成模型SimCLCTS (Simple Model for Contrastive Learning of Chines...生成技术旨在解决海量中文文本所带来的信息过载和冗余问题,以提高信息传播效率和方便读者获取信息。在序列到序列深度模型基础上,提出了一种引入对比学习的中文摘要生成模型SimCLCTS (Simple Model for Contrastive Learning of Chinese Text Summarization)。SimCLCTS通过在模型中增加以对比损失函数为特征的无监督评估模块,弥补了序列到序列模型中学习目标和评价指标不一致导致的暴露偏差问题。对比实验表明,该模型减少了暴露偏差量,在面向新闻类的中文文本摘要生成中取得了良好效果。展开更多
现有生成式文本摘要模型缺乏对关键词信息的关注,存在输入文本中关键信息丢失问题.因此,提出了一种基于关键词语义信息增强的指针生成网络(keyword semantic information enhancement pointer-generator networks,KSIE-PGN)模型.首先,...现有生成式文本摘要模型缺乏对关键词信息的关注,存在输入文本中关键信息丢失问题.因此,提出了一种基于关键词语义信息增强的指针生成网络(keyword semantic information enhancement pointer-generator networks,KSIE-PGN)模型.首先,构建了基于DistilBERT的关键词抽取模型(keywords selection method based on BERT,KSBERT).其次,提出了基于关键词掩码的覆盖机制,在使用覆盖机制时,保留解码过程中模型对关键词的持续关注.接着,KSIE-PGN模型在解码过程融合了多种关键词信息,包括关键词语义向量和关键词上下文向量,从而解决解码器丢失输入文本关键信息这一问题.在CNN/Daily Mail数据集上的实验结果表明KSIE-PGN模型能够较好地捕捉输入文本中的关键信息.展开更多
基金funded by Vietnam National Foundation for Science and Technology Development(NAFOSTED)under Grant Number 102.05-2020.26。
文摘Text summarization aims to generate a concise version of the original text.The longer the summary text is,themore detailed it will be fromthe original text,and this depends on the intended use.Therefore,the problem of generating summary texts with desired lengths is a vital task to put the research into practice.To solve this problem,in this paper,we propose a new method to integrate the desired length of the summarized text into the encoder-decoder model for the abstractive text summarization problem.This length parameter is integrated into the encoding phase at each self-attention step and the decoding process by preserving the remaining length for calculating headattention in the generation process and using it as length embeddings added to theword embeddings.We conducted experiments for the proposed model on the two data sets,Cable News Network(CNN)Daily and NEWSROOM,with different desired output lengths.The obtained results show the proposed model’s effectiveness compared with related studies.
文摘研究探讨了使用预训练的Pegasus模型进行长文本摘要时,不同文本分割方法对摘要质量的影响。收集来自知网的200篇关于STM32单片机的学术论文作为实验文本,比较了滑动窗口、句子分割、段落分割及滑动窗口加句子分割四种分割法的长文本摘要生成效果。实验使用ROUGE(Recall-Oriented Understudy for Gisting Evaluation)指标对生成的摘要进行评估,并对实验结果进行了详细分析。在生成摘要的质量方面,段落分割法表现出色,其ROUGE-1、ROUGE-2和ROUGE-L评分分别达到了30.85、7.60和20.15,轻微超过了句子分割法的评分,且显著优于句子分割加滑动窗口法。该研究旨在为研究者和开发者提供关于长文本摘要的实践经验和见解。
文摘Although contrastive move analysis of article abstracts has been a highlight,few studies focus on abstracts of natural sci⁃ence articles.To compensate for this gap,this study,based on IMRD model,focuses on aquatic biology abstracts and contrasts those by native English speakers and those by Chinese authors.Combining quantitative and qualitative studies,it reveals their dif⁃ferences and similarities in terms of the frequency of different moves,sentence length and move length significance.Such similari⁃ties and differences can be explained by the face culture of China,the different language proficiency and the common convention of academic abstract.
文摘生成技术旨在解决海量中文文本所带来的信息过载和冗余问题,以提高信息传播效率和方便读者获取信息。在序列到序列深度模型基础上,提出了一种引入对比学习的中文摘要生成模型SimCLCTS (Simple Model for Contrastive Learning of Chinese Text Summarization)。SimCLCTS通过在模型中增加以对比损失函数为特征的无监督评估模块,弥补了序列到序列模型中学习目标和评价指标不一致导致的暴露偏差问题。对比实验表明,该模型减少了暴露偏差量,在面向新闻类的中文文本摘要生成中取得了良好效果。
文摘现有生成式文本摘要模型缺乏对关键词信息的关注,存在输入文本中关键信息丢失问题.因此,提出了一种基于关键词语义信息增强的指针生成网络(keyword semantic information enhancement pointer-generator networks,KSIE-PGN)模型.首先,构建了基于DistilBERT的关键词抽取模型(keywords selection method based on BERT,KSBERT).其次,提出了基于关键词掩码的覆盖机制,在使用覆盖机制时,保留解码过程中模型对关键词的持续关注.接着,KSIE-PGN模型在解码过程融合了多种关键词信息,包括关键词语义向量和关键词上下文向量,从而解决解码器丢失输入文本关键信息这一问题.在CNN/Daily Mail数据集上的实验结果表明KSIE-PGN模型能够较好地捕捉输入文本中的关键信息.