Intrinsic and Extrinsic Automatic Evaluation Strategies for Paraphrase Generation Systems

Intrinsic and Extrinsic Automatic Evaluation Strategies for Paraphrase Generation Systems

下载PDF

导出

摘要 Paraphrase is an expression of a text with alternative words and orders to achieve a better clarity. Paraphrases have been found vital for augmenting training dataset, which aid to enhance performance of machine learning models that intended for various natural language processing (NLP) tasks. Thus, recently, automatic paraphrase generation has received increasing attention. However, evaluating quality of generated paraphrases is technically challenging. In the literature, the importance of generated paraphrases is tended to be determined by their impact on the performance of other NLP tasks. This kind of evaluation is referred as extrinsic evaluation, which requires high computational resources to train and test the models. So far, very little attention has been paid to the role of intrinsic evaluation in which quality of generated paraphrase is judged against predefined ground truth (reference paraphrases). In fact, it is also very challenging to find ideal and complete reference paraphrases. Therefore, in this study, we propose semantic or meaning oriented automatic evaluation metric that helps to evaluate quality of generated paraphrases against the original text, which is an intrinsic evaluation approach. Further, we evaluate quality of the paraphrases by assessing their impact on other NLP tasks, which is an extrinsic evaluation method. The goal is to explore the relationship between intrinsic and extrinsic evaluation methods. To ensure the effectiveness of proposed evaluation methods, extensive experiments are done on different publicly available datasets. The experimental results demonstrate that our proposed intrinsic and extrinsic evaluation strategies are promising. The results further reveal that there is a significant correlation between intrinsic and extrinsic evaluation approaches. Paraphrase is an expression of a text with alternative words and orders to achieve a better clarity. Paraphrases have been found vital for augmenting training dataset, which aid to enhance performance of machine learning models that intended for various natural language processing (NLP) tasks. Thus, recently, automatic paraphrase generation has received increasing attention. However, evaluating quality of generated paraphrases is technically challenging. In the literature, the importance of generated paraphrases is tended to be determined by their impact on the performance of other NLP tasks. This kind of evaluation is referred as extrinsic evaluation, which requires high computational resources to train and test the models. So far, very little attention has been paid to the role of intrinsic evaluation in which quality of generated paraphrase is judged against predefined ground truth (reference paraphrases). In fact, it is also very challenging to find ideal and complete reference paraphrases. Therefore, in this study, we propose semantic or meaning oriented automatic evaluation metric that helps to evaluate quality of generated paraphrases against the original text, which is an intrinsic evaluation approach. Further, we evaluate quality of the paraphrases by assessing their impact on other NLP tasks, which is an extrinsic evaluation method. The goal is to explore the relationship between intrinsic and extrinsic evaluation methods. To ensure the effectiveness of proposed evaluation methods, extensive experiments are done on different publicly available datasets. The experimental results demonstrate that our proposed intrinsic and extrinsic evaluation strategies are promising. The results further reveal that there is a significant correlation between intrinsic and extrinsic evaluation approaches.

作者 Tulu Tilahun Hailu Junqing Yu Tessfu Geteye Fantaye

机构地区 School of Computer Science and Technology [

出处《Journal of Computer and Communications》 2020年第2期1-16,共16页 电脑和通信（英文）

关键词 PARAPHRASE PARAPHRASE Generation Natural Language Processing INTRINSIC EXTRINSIC Automatic Evaluation Word Embedding SENTIMENT Analysis Paraphrase Paraphrase Generation Natural Language Processing Intrinsic Extrinsic Automatic Evaluation Word Embedding Sentiment Analysis

分类号 H31 [语言文字—英语]

引文网络
相关文献

1才智杰,孙茂松,才让卓玛.藏文词向量相似度和相关性评测集构建[J].中文信息学报,2019,0(7):81-87. 被引量：5
2盛善桂,王亚男,王少如,赵开,王云英,徐晓娜,王奇民,童磊,陈正岗.体外沉默法尼基转移酶对舌鳞状细胞癌迁移和侵袭的影响[J].华西口腔医学杂志,2020,38(2):177-184. 被引量：1
3Benard Ayieko.Collective Action How the AU plans to combat the coronavirus pandemic on the continent[J].ChinAfrica,2020,12(5):18-19.
4陈亮,李庆姝,付广,葛明建.肺上皮-肌上皮癌1例并文献复习[J].中国肺癌杂志,2020,23(2):127-132. 被引量：4
5李洁,杨波,杨洪涛.扶肾颗粒基于“和”法防治腹膜纤维化的网络药理学佐证[J].辽宁中医杂志,2020,47(1):124-127. 被引量：1
6Audrey GUO.Guangxi Is Actively Developing the Digital Economy[J].China's Foreign Trade,2020(2):28-29.
7张瑜.On Feminism in Literary Translation[J].海外英语,2019(23):191-193.
8Meng-qi CAO,Jing LIANG,Ming-zhao LI,Zheng-hao ZHOU,Min ZHU.TDIVis:visual analysis of tourism destination images[J].Frontiers of Information Technology & Electronic Engineering,2020,21(4):536-557. 被引量：3
9杨静丽,葛闯,易琳,张海伟,林昌海,唐万燕,陈霞,辇伟奇.高通量测序在非小细胞肺癌基因突变研究中的应用[J].国际检验医学杂志,2020,41(10):1161-1166. 被引量：3
10杨娟娟,高晓阳,李红岭,贾尚云.基于机器视觉的无人机避障系统研究[J].中国农机化学报,2020,41(2):155-160. 被引量：10

Journal of Computer and Communications

2020年第2期

浏览历史

内容加载中请稍等...

Intrinsic and Extrinsic Automatic Evaluation Strategies for Paraphrase Generation Systems

相关作者

相关机构

相关主题

浏览历史