摘要
研究探讨了使用预训练的Pegasus模型进行长文本摘要时,不同文本分割方法对摘要质量的影响。收集来自知网的200篇关于STM32单片机的学术论文作为实验文本,比较了滑动窗口、句子分割、段落分割及滑动窗口加句子分割四种分割法的长文本摘要生成效果。实验使用ROUGE(Recall-Oriented Understudy for Gisting Evaluation)指标对生成的摘要进行评估,并对实验结果进行了详细分析。在生成摘要的质量方面,段落分割法表现出色,其ROUGE-1、ROUGE-2和ROUGE-L评分分别达到了30.85、7.60和20.15,轻微超过了句子分割法的评分,且显著优于句子分割加滑动窗口法。该研究旨在为研究者和开发者提供关于长文本摘要的实践经验和见解。
This study explores the effects of different text segmentation methods on the quality of long text summaries using pre-trained Pegasus model.This paper collects 200 academic papers about STM32 MCU from Knownet as experimental text,and compares the generation effect of four segmentation methods:sliding window,sentence segmentation,paragraph segmentation and sliding window plus sentence segmentation.In the experiment,ROUGE(Recall-Oriented Understudy for Gisting Evaluation)index was used to evaluate the generated abstracts,and the experimental results were analyzed in detail.In terms of the quality of abstracts generated,paragraph segmentation performed well,with the scores of ROUGE-1,ROUGE-2 and ROUGE-L reaching 30.85,7.60 and 20.15,respectively,slightly exceeding the scores of sentence segmentation and significantly superior to sentence segmentation plus sliding window.This study is to provide researchers and developers with practical experience and insights on long text summaries.
作者
龙川
张芹
谢亮生
潘琛
文瑜
杨俊锋
LONG Chuan;ZHANG Qin;XIE Liang-Sheng;PAN Chen;WEN Yu;YANG Jun-Feng(School of Testing and Optoelectronic Engineering,Nanchang Hangkong University,Nanchang 330000,China)
出处
《电脑与信息技术》
2024年第4期64-66,90,共4页
Computer and Information Technology
基金
江西省创新领军人才长期项目(项目编号:S2020LQCQ0889)
江西省自然科学基金项目(项目编号:20212BAB201022)
教育部产学研协同育人项目(项目编号:202002032008)。