摘要
自动生成领域,传统的ROUGE评测方法已多次被研究者发现其评测结果与人工评测结果差距过大,但该差距尚未数值化,无法丈量。基于此现状,本文采用多个不同类型、长度的公开中文摘要数据集,通过定义语义损失率计算方法来衡量ROUGE在评价时所产生的语义损失程度,同时综合考虑摘要长度以及数据集内在因素对生成摘要评价的影响,最终可视化ROUGE评测与人工评测存在误差的具体数值。实验结果表明,ROUGE评测分数与人工评测分数呈弱相关性,ROUGE方法对不同长度的数据集都存在一定程度的语义损失,同时摘要长度和数据集的原始标注误差也会对最终的评测分数产生重要影响。本文定义的语义损失率计算方法可以为更好地选择数据集和评测方法提供一定的参考依据,为改进评测方法提供一定的思路方向,同时也对最终客观测评模型的有效性提供一定的指导帮助。
In the current field of text summarization automatic generation,the traditional ROUGE evaluation method has been repeatedly found by researchers that the gap between its evaluation results and artificial evaluation results is too large,but the gap has not been numerical and cannot be measured.Based on this situation,this paper uses multiple public Chinese summary datasets of different types and lengths to measure the degree of semantic loss generated by ROUGE in the evaluation by defining the calculation method of semantic loss rate.At the same time,it comprehensively considers the influence of summary length and internal factors of datasets on the generation of summary evaluation,and the specific values of errors between ROUGE evaluation and artificial evaluation are visualized finally.The experimental results show that the ROUGE evaluation score is weakly correlated with the artificial evaluation score.ROUGE method has a certain degree of semantic loss for different length datasets,and the length of the summary and the original annotation error of the datasets will also have an important impact on the final evalua⁃tion score.The calculation method of semantic loss rate defined in this paper can provide a certain reference for better selection of datasets and evaluation methods,provide a direction of thinking for improving evaluation methods,and also provide certain a guidance and help for the effectiveness of the final objective evaluation model.
作者
金独亮
范永胜
张琪
JIN Du-liang;FAN Yong-sheng;ZHANG Qi(School of Computer and Information Sciences,Chongqing Normal University,Chongqing 401331,China)
出处
《计算机与现代化》
2023年第3期84-89,共6页
Computer and Modernization
基金
重庆师范大学(人才引进/博士启动)基金资助项目(17XCB008)
教育部人文社会科学研究项目(18XJC880002)
重庆市教育委员会科技项目(KJQN201800539)。
关键词
文本摘要
评测方法
语义损失率
数据集偏差
text summarization
evaluation method
semantic-loss rate
dataset bias