期刊文献+

Co-training机器学习方法在中文组块识别中的应用 被引量:8

Chinese Text Chunking Using Co-training Method
下载PDF
导出
摘要 采用半指导机器学习方法co training实现中文组块识别。首先明确了中文组块的定义,co training算法的形式化定义。文中提出了基于一致性的co training选取方法将增益的隐马尔可夫模型(TransductiveHMM)和基于转换规则的分类器(fnTBL)组合成一个分类体系,并与自我训练方法进行了比较,在小规模汉语树库语料和大规模未带标汉语语料上进行中文组块识别,实验结果要比单纯使用小规模的树库语料有所提高,F值分别达到了85 34%和83 4 1% ,分别提高了2 13%和7 2 1%。 In this paper we discuss the application of semi-supervised machine learning method-co-training on Chinese Text Chunking. Firstly, we give the definition of Chinese chunk,then the formalized definition of co-training algorithm.We proposed a example selection method based on the consistence, using two classifiers : Transductive HMM and fnTBL to combine a classification system to perform the Chinese text chunking task with the small-scale labled Chinese treebank and large-scale unlabled Chinese corpus. The result were compared with the self-training result and the result of the non co-training experiment in which we only used the small-scale Chinese treebank as training data and use one classifier(Transductive HMM or fnTBL) to recognize the Chinese chunk. The improvement is significant, the F value of the two classifiers reached 83.41%,85.34%, get a improvement of 2.13 points and 7.21 points respectively.
出处 《中文信息学报》 CSCD 北大核心 2005年第3期73-79,共7页 Journal of Chinese Information Processing
基金 国家教育部科学技术研究重点资助项目 (10 4 0 6 5 ) 国家自然科学基金和微软亚洲研究院联合资助项目 (6 0 2 0 30 19)
关键词 计算机应用 中文信息处理 co-training算法 中文组块 分类器 computer application Chinese information processing co-training algorithm Chinese chunk classifier
  • 相关文献

参考文献10

  • 1Seong-Bae Park, Jangmin O, Byoung-Tak Zhang. Text Categorization Using Co-Trained Support Vector Machines with Both Lexical and Syntactic Information[Z] .In: NIPS 2001 Workshop on Machine learning Methods for Text and Images Whistler/Blackcomb Resort[ C], BC, CANADA, 2001.
  • 2David Pierce and Claire Cardie. Limitations of Co-Training for Natural Language ~arning from Large Datasets[Z],Department of Computer Science, Comell University, Ithaca NY, 2001.
  • 3M. Collins and Y. Singer. Unsupervised models for named entity classification[Z]. Proc. Joint SIGDAT Conf. on EMNLP/VLC, 1999.
  • 4Christoph Mtiller, stefan Rapp, Michael Smabe. Applying Co-Training to Reference Resolution[ A] In: ACL '02[ C],2002, 352 - 359.
  • 5S. Abney. Part-of-speech tagging and partial parsing[A]. In : Church K,Young S, Bloothooft Geds. Corpus-Based Methods in Language and Speech [ C ], an ELSENET volume, Dordrecht : Kluwer Academic Publisher, 1996,119136.
  • 6A. Blum and T. Mitchell. Combining labeled and unlabeled data with co-training[Z]. In:Proceedings of the 11th Annual Conference on Computational Learning Theory (COLT-98)[C]. 1998.
  • 7Heng Li, Jonathan J. Webster, Chunyu Kit, Tianshun Yao. Transductive HMM based Chinese Text Chunking[ Z].IEEE NLP-KE2003, 257- 262, Beijing, China, 2003.
  • 8Radu Florian. Named Entity Recognition as a House of Cards: Classifier Stacking[R], In:Proceedings of CoNLL-2002[ C]. Taipei, 2002.
  • 9S. Abny. Bootstrapping[A]. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics[C], Taipei, 2002.
  • 10Sanjoy Dasgupta. Performance Guarantees for Hierarchical Clusterlng[J]. COLT 2002:351 - 363, 2002.

同被引文献111

引证文献8

二级引证文献19

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部