期刊文献⁺

任意字段

题名或关键词

题名

关键词

文摘

作者

第一作者

机构

刊名

分类号

参考文献

作者简介

基金资助

栏目信息

Controllable data synthesis method for grammatical error correction 被引量：1

原文传递

导出

摘要 Due to the lack of parallel data in current grammatical error correction(GEC)task,models based on sequence to sequence framework cannot be adequately trained to obtain higher performance.We propose two data synthesis methods which can control the error rate and the ratio of error types on synthetic data.The first approach is to corrupt each word in the monolingual corpus with a fixed probability,including replacement,insertion and deletion.Another approach is to train error generation models and further filtering the decoding results of the models.The experiments on different synthetic data show that the error rate is 40%and that the ratio of error types is the same can improve the model performance better.Finally,we synthesize about 100 million data and achieve comparable performance as the state of the art,which uses twice as much data as we use.

作者 Liner Yang Chengcheng Wang Yun Chen Yongping Du Erhong Yang

机构地区 School of Information Science Faculty of Information Technology Beijing Advanced Innovation Center for Language Resources School of Information Management&Engineering

出处《Frontiers of Computer Science》 SCIE EI CSCD 2022年第4期69-78,共10页 中国计算机科学前沿（英文版）

基金 was supported by the funds of Bejing Advanced Innovation Center for Language Resources.(TYZ19005) Research Program of State Language Commission(ZDI135-105,YB135-89).

关键词 grammatical error correction sequence to sequence data synthesis

分类号 TP391 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

同被引文献3

1王辰成,杨麟儿,王莹莹,杜永萍,杨尔弘.基于Transformer增强架构的中文语法纠错方法[J].中文信息学报,2020(6):106-114. 被引量：29
2胡晓峰,李伟,马萍,杨明.复杂仿真实验结果可信度评估方法[J].上海航天,2019,36(4):37-41. 被引量：2
3李建义,白雪丽,王洪俊,王迦南.关于中文拼写纠错数据增强的方法[J].北华航天工业学院学报,2021,31(6):1-4. 被引量：1

引证文献1

1宋程,谢振平.中文纠错任务为例的数据集增强质量评价方法[J].计算机工程与应用,2024,60(3):331-339.

1Desh Deepak SHARMA,S.N.SINGH,Jeremy LIN,Elham FORUZAN.Identification and characterization of irregular consumptions of load data[J].Journal of Modern Power Systems and Clean Energy,2017,5(3):465-477. 被引量：4
2Hyun Woo Lee,Soo Young Moon,Tae Ho Cho.Probability Control for Verification of an Event Report Using Fuzzy System[J].Wireless Sensor Network,2011,3(12):371-377.
3YUN Zi-tong.A Comparative Study of the Colligation of the Japanese Degree Adverb“非常に”and“大変”[J].Journal of Literature and Art Studies,2022,12(7):731-737.
4Ting Deng.A Comparative Study on Bilingual and Monolingual Online Kids’English Lessons[J].US-China Education Review(A),2022,12(1):34-39.
5Jagroop Kaur,Jaswinder Singh.Roman to Gurmukhi Social Media Text Normalization[J].International Journal of Intelligent Computing and Cybernetics,2020,13(4):407-435. 被引量：1
6魏巍,刘玉坤,张钰莹,朱春城.第一性原理计算Ti_(3)GeC_(2)-β相材料的晶格动力学与热力学性质[J].哈尔滨师范大学自然科学学报,2022,38(2):64-67.
7SHEN Yulin,QIN Yong,CUI Min,XIE Guoliang,GUO Yinghai,QU Zhenghui,YANG Tianyang,YANG Liu.Geochemical Characteristics and Sedimentary Control of Pinghu Formation(Eocene)Coal-bearing Source Rocks in Xihu Depression,East China Sea Basin[J].Acta Geologica Sinica(English Edition),2021,95(1):91-104. 被引量：4
8Jingjing Wu,Tingting Hu,Guoping Zhao,Anran Li,Ruizheng Liang.Two-dimensional transition metal chalcogenide nanomaterials for cancer diagnosis and treatment[J].Chinese Chemical Letters,2022,33(10):4437-4448.
9Guo-Shuai Fu,Hong-Zhi Gao,Guo-Wei Yang,Peng Yu,Pu Liu.Laser fragmentation in liquid synthesis of novel palladium-sulfur compound nanoparticles as efficient electrocatalysts for hydrogen evolution reaction[J].Chinese Physics B,2022,31(7):34-40.
10Yuan Sun,Chaofan Chen,Andong Chen,Xiaobing Zhao.Tibetan Question Generation Based on Sequence to Sequence Model[J].Computers, Materials & Continua,2021(9):3203-3213.

Frontiers of Computer Science

2022年第4期

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...

;

使用帮助返回顶部