期刊文献+

古汉语句子切分与句读标记方法研究 被引量:2

Research on Sentence Segmentation and Punctuation in Ancient Chinese
下载PDF
导出
摘要 利用自然语言理解技术进行古汉语断句及句读标注的主要挑战是数据稀疏问题.为了解决这一难题,设计了一种六字位标记集,提出了一种基于层叠式条件随机场模型的古文断句与句读标记方法.基于六字位标集,低层模型用观察序列确定句子边界,高层模型同时使用观察序列和低层的句子边界信息进行句读标记.实验在5 M混合古文语料上分别进行了封闭测试和开放测试,封闭测试断句与句读标注的F值分别达到96.48%和91.35%,开放测试断句与句读标注的F值分别达到71.42%和67.67%. Data sparseness is a primary challenge in sentence segmentation and punctuation in ancient Chinese using natural language processing technology. In order to overcome this difficulty, a 6-tag set was designed and a method based on cascaded Conditional Random Fields was proposed. The main idea is as follows: based on the 6-tag set, a low level model determines the boundaries of sentences according to observation sequence and a high level model punctuates sentences taking consideration of both observation sequence and low level's results. Close test and open test were done based on approximate 5M mixed corpus respectively. The F measure of sentence segmentation and punctuation are 96.48% and 91.35% respectively in close test, and those are 71.42% and 67.67% respectively in open test.
出处 《河南大学学报(自然科学版)》 CAS 北大核心 2009年第5期525-529,共5页 Journal of Henan University:Natural Science
关键词 古汉语 层叠条件随机场 数据稀疏 句子切分 句读标注 ancient Chinese cascaded conditional random fields data sparseness sentence segmentation punctuation
  • 相关文献

参考文献8

二级参考文献41

  • 1黄昌宁.中文信息处理中的分词问题[J].语言文字应用,1997(1):74-80. 被引量:83
  • 2刘开瑛.现代汉语自动分词评测技术研究[J].语言文字应用,1997(1):103-108. 被引量:15
  • 3孙茂松,邹嘉彦.汉语自动分词研究评述[J].当代语言学,2001,3(1):22-32. 被引量:101
  • 4杨尔弘,方莹,刘冬明,乔羽.汉语自动分词和词性标注评测[J].中文信息学报,2006,20(1):44-49. 被引量:16
  • 5黄昌宁,赵海.中文分词十年回顾[J].中文信息学报,2007,21(3):8-19. 被引量:249
  • 6H Y Tan. Chinese place automatic recognition research. In: C N Huang, Z D Dong, eds. Proc of Computational Language.Beijing: Tsinghua University Press, 1999
  • 7Zhang Huaping, Liu Qun, Zhang Hao, et al. Automatic recognition of Chinese unknown words recognition. First SIGHAN Workshop Attached with the 19th COLING, Taipei, 2002
  • 8S R Ye, T S Chua, J M Liu. An agent-based approach to Chinese named entity recognition. The 19th Int'l Conf on Computational Linguistics, Taipei, 2002
  • 9J Sun, J F Gao, L Zhang, et al. Chinese named entity identification using class-based language model. The 19th Int'l Conf on Computational Linguistics, Taipei, 2002
  • 10Lawrence R Rabiner. A tutorial on hidden Markov models and selected applications in speech recognition. Proc of IEEE, 1989,77(2): 257~286

共引文献257

同被引文献98

引证文献2

二级引证文献14

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部