期刊文献+

一种基于Bootstrapping构建训练语料的方法

Training Corpus Construction Based on Bootstrapping
下载PDF
导出
摘要 提出一种基于Bootstrapping算法构建训练语料的方法.该方法从自动标注的语料中随机选取部分语料,人工修正后生成种子集,用该种子集训练一个基于类的语言模型,然后使用该模型自动标注剩余的语料;再从剩余语料中选取部分语料进行以上处理,如此循环直到训练语料标注质量理想.实验结果表明,该方法在保证训练语料标注质量理想的情况下,能够大幅度地减少人工参与.
出处 《计算机研究与发展》 EI CSCD 北大核心 2007年第z2期394-397,共4页 Journal of Computer Research and Development
基金 国家自然科学基金项目(60663004) 教育部博士点基金项目(20050007023)
  • 相关文献

参考文献7

  • 1[1]Jian Sun,Jianfeng Gao.Chinese named entity identification using class-based language model.The 19th Int'l Conf on Computational Linguistics,Taipei,2002
  • 2尹继豪,樊孝忠,赵攀超,于江德.基于组块分析技术的中文机构名称识别[J].哈尔滨工程大学学报,2006,27(B07):466-470. 被引量:5
  • 3[3]D E Appelt,D J Israel.Introduction to information extraction technology.Tutorial for IJCAI-99,Stockholm,1999
  • 4[4]Xiangping Ge,Wanda Pratt,Padhraic Smyth.Discovering Chinese words from unsegmented text.SIGIR-99,Berkeley,USA,1999
  • 5[5]Wu,Andi.Chinese word segmentation in MSR-NLP.The 2nd SIGHAN Workshop on Chinese Language Processing,Sapporo,2003
  • 6刘群,张华平,俞鸿魁,程学旗.基于层叠隐马模型的汉语词法分析[J].计算机研究与发展,2004,41(8):1421-1429. 被引量:197
  • 7[7]J Gao,H F Wang,M Li,et al.A unified approach to statistical language modeling for Chinese.ICASPP-2000,Istanbul,Turkey,2000

二级参考文献28

  • 1H Y Tan. Chinese place automatic recognition research. In: C N Huang, Z D Dong, eds. Proc of Computational Language.Beijing: Tsinghua University Press, 1999
  • 2Zhang Huaping, Liu Qun, Zhang Hao, et al. Automatic recognition of Chinese unknown words recognition. First SIGHAN Workshop Attached with the 19th COLING, Taipei, 2002
  • 3S R Ye, T S Chua, J M Liu. An agent-based approach to Chinese named entity recognition. The 19th Int'l Conf on Computational Linguistics, Taipei, 2002
  • 4J Sun, J F Gao, L Zhang, et al. Chinese named entity identification using class-based language model. The 19th Int'l Conf on Computational Linguistics, Taipei, 2002
  • 5Lawrence R Rabiner. A tutorial on hidden Markov models and selected applications in speech recognition. Proc of IEEE, 1989,77(2): 257~286
  • 6Shai Fine, Yoram Singer, Naftali Tishby. The hierarchical hidden Markov model: Analysis and applications. Machine Learning,1998, 32(1): 41~62
  • 7Richard Sproat, Thomas Emerson. The first international Chinese word segmentation bakeoff. The First SIGHAN Workshop Attached with the ACL2003, Sapporo, Japan, 2003. 133~143
  • 8J Hockenmaier, C Brew. Error-driven learning of Chinese word segmentation. In: J Guo, K T Lua, J Xu, eds. The 12th Pacific Conf on Language and Information, Singapore, 1998
  • 9Andi Wu, Zixin Jiang. Word segmentation in sentence analysis.1998 Int'l Conf on Chinese Information Processing, Beijing, 1998
  • 10D Palmer. A trainable rule-based algorithm for word segmentation. The 35th Annual Meeting of the Association for Computational Linguistics (ACL'97), Madrid, 1997

共引文献200

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部