摘要
篇章分析系统性研究的开展依赖于大规模高质量的标注语料。现有标注语料以纯手工标注和单机辅助标注为主,难以满足标注效率和语料质量的需求。因此,该文提出了一套简洁的语料标注协同流程,并基于此实现了一个汉语篇章宏观结构语料协同标注系统,提供了一种流程简洁、分角色协同合作、自动流程控制、安全可靠的线上标注模式。该系统通过设立标注流程状态、收集标注流程中用户的行为数据和语料库辅助统计等方法,从流程控制角度,优化汉语宏观篇章的标注流程,实现质量管控和数据分析。项目实践表明,该系统有效减少了相关标注人员的工作量,提高了标注效率和标注质量,可为大规模、协同汉语篇章语料标注打下基础。
Systematic research on discourse analysis relies on the large-scale and high-quality annotated corpus.The existing corpora are mainly manual or single machine aided annotated,which is challenging to ensure annotation efficiency and quality.Therefore,this paper proposes a straightforward collaborative process of corpus annotation and implements a collaborative annota⁃tion system of Chinese discourse macro structure corpus based on process control.It provides a simple process,role-based collabora⁃tive cooperation,automatic process control,safe and reliable online annotation mode.The system optimizes the Chinese macro dis⁃course annotation process from process control by setting up annotation process status,collecting user behavior data,and corpus-as⁃sisted statistics.Project practice shows that the system can effectively reduce the workload of relevant tagging personnel,improve the tagging efficiency and quality,and lay the foundation for large-scale discourse corpus annotation.
作者
徐宸涵
顾宇浩
张志昊
褚晓敏
蒋峰
XU Chenhan;GU Yuhao;ZHANG Zhihao;CHU Xiaomin;JIANG Feng(School of Computer Science and Technology,Soochow University,Suzhou 215006)
出处
《计算机与数字工程》
2021年第12期2519-2525,共7页
Computer & Digital Engineering
关键词
流程控制
语料标注
篇章分析
自然语言处理
process control
corpus annotation
discourse analysis
natural language processing