期刊文献+

基于流程控制的汉语篇章结构语料协同标注系统 被引量:1

Collaborative Annotation System of Chinese Discourse Structure Corpus Based on Process Control
下载PDF
导出
摘要 篇章分析系统性研究的开展依赖于大规模高质量的标注语料。现有标注语料以纯手工标注和单机辅助标注为主,难以满足标注效率和语料质量的需求。因此,该文提出了一套简洁的语料标注协同流程,并基于此实现了一个汉语篇章宏观结构语料协同标注系统,提供了一种流程简洁、分角色协同合作、自动流程控制、安全可靠的线上标注模式。该系统通过设立标注流程状态、收集标注流程中用户的行为数据和语料库辅助统计等方法,从流程控制角度,优化汉语宏观篇章的标注流程,实现质量管控和数据分析。项目实践表明,该系统有效减少了相关标注人员的工作量,提高了标注效率和标注质量,可为大规模、协同汉语篇章语料标注打下基础。 Systematic research on discourse analysis relies on the large-scale and high-quality annotated corpus.The existing corpora are mainly manual or single machine aided annotated,which is challenging to ensure annotation efficiency and quality.Therefore,this paper proposes a straightforward collaborative process of corpus annotation and implements a collaborative annota⁃tion system of Chinese discourse macro structure corpus based on process control.It provides a simple process,role-based collabora⁃tive cooperation,automatic process control,safe and reliable online annotation mode.The system optimizes the Chinese macro dis⁃course annotation process from process control by setting up annotation process status,collecting user behavior data,and corpus-as⁃sisted statistics.Project practice shows that the system can effectively reduce the workload of relevant tagging personnel,improve the tagging efficiency and quality,and lay the foundation for large-scale discourse corpus annotation.
作者 徐宸涵 顾宇浩 张志昊 褚晓敏 蒋峰 XU Chenhan;GU Yuhao;ZHANG Zhihao;CHU Xiaomin;JIANG Feng(School of Computer Science and Technology,Soochow University,Suzhou 215006)
出处 《计算机与数字工程》 2021年第12期2519-2525,共7页 Computer & Digital Engineering
关键词 流程控制 语料标注 篇章分析 自然语言处理 process control corpus annotation discourse analysis natural language processing
  • 相关文献

参考文献10

二级参考文献70

  • 1Cynthia Changxin Wang.Using Domain Ontology in a Semantic Blogging System for Construction Professionals[J].Tsinghua Science and Technology,2008,13(S1):279-285. 被引量:2
  • 2张晓艳,王挺,陈火旺.命名实体识别研究[J].计算机科学,2005,32(4):44-48. 被引量:66
  • 3俞鸿魁,张华平,刘群,吕学强,施水才.基于层叠隐马尔可夫模型的中文命名实体识别[J].通信学报,2006,27(2):87-94. 被引量:157
  • 4秦兵,刘挺,陈尚林,李生.多文档文摘中句子优化选择方法研究[J].计算机研究与发展,2006,43(6):1129-1134. 被引量:13
  • 5T. McEnery, A. Wilson Corpus linguistics[M]. Britain: Edinburgh University Press 1996.
  • 6Lun-Wei Ku, Tung-Ho Wu, Li-Ying Lee and Hsin-Hsi Chen. Construction of an Evaluation Corpus for Opinion Extraction[C]//Proceedings of NTCIR-5 Workshop Meeting, Tokyo, Japan: 2005.
  • 7Y. Xia, K.-F. Wong and W. Li. A Phonetic-Based Approach to Chinese Chat Text Normalization[C]// Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics Sydney, Australia:2006.
  • 8S.M. Kim, and E. Hovy. Determining the Sentiment of Opinions[C]//Proc. of COLING-04, the Conference on Computational Linguistics. Geneva, Switzerland,2004.
  • 9董振东,董强.知网[EB/OL].http://www.keenage.com/zhiwang/c_zhiwang.html
  • 10D.Biber,S.Conrad,and R.Reppen著潘永操导读Corpus Linguistics[M].外语教学与研究出版社剑桥大学出版社,2000

共引文献339

同被引文献7

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部