期刊文献+

统计与规则相结合的维吾尔语句子边界识别 被引量:7

Sentence boundary detection of Uyghur based on rules and statistics
下载PDF
导出
摘要 句子边界识别是词性标注和句法分析等自然语言处理系统的基础问题。提出了一种统计与规则相结合的维吾尔语句子边界识别方法,首先利用歧义段落分类算法分类段落,第二步对无歧义段落进行基于规则的句子边界识别,最后使用最大熵模型对有歧义段落进行句子边界识别。该方法有效利用规则弥补最大熵模型因数据稀疏而误判不存在任何歧义情况的不足,使用最大熵模型有效地消除歧义,提高算法的鲁棒性,召回率达到了98.77%。 Sentence boundary is an important initial task for many natural language processing applications,such as part-of-speech tagging and parsing etc.This paper proposes an automatic sentence boundary detection method of Uyghur based on rules and statistic.Firstly,the paragraph detecting algorithm classifies the ambiguous and unambiguous paragraph.In the second step,the rule based sentence boundary detector process the unambiguous paragraphs.Finally,the maximum entropy based sentence boundary detecting model identifies the ambiguous paragraph sentences.This method improves robustness of the method by making plenty use of rule to reduce the failure of the ME model to identify the unambiguous paragraphs which can be attributed to the sparsity of the training data used and the ME model to resolve ambiguity,the recall of this method reaches 98.77%.
出处 《计算机工程与应用》 CSCD 北大核心 2010年第14期162-165,共4页 Computer Engineering and Applications
基金 国家自然科学基金No.60663006 新疆维吾尔自治区高新技术计划项目No.200712109~~
关键词 维吾尔文 句子边界识别 规则 特征选择 最大熵 Uyghur sentence boundary detection rule feature extraction maximum entropy
  • 相关文献

参考文献9

  • 1Pahner D D,Hearst M A.Adaptive sentence boundary disambiguation[C]//Proceedings of the 1994 Conference on Applied Natural Language Processing(ANLP),Stuttgart,Germany,1994:78.
  • 2阿比达.吾买尔,吐尔根.依布拉音.维吾尔语句子边界识别算法的设计与实现[J].新疆大学学报(自然科学版),2008,25(3):360-363. 被引量:10
  • 3Riley M D.Some applications of tree-based modelling to speech and language[C]//DARPA Speech and Language Technology Workshop,Cape Ced,Maasachusetts,1998:339-352.
  • 4Reynar J C,Ratnaparkhi A.A maximum entropy approach to identifyhag sentence boundaries[C]//Proceedings of the 1997 Conference on Applied Natural Language Processing,Washington D C,1997:16-19.
  • 5Agarwal N,Ford K H,Shneider M.Sentence boundary detection using a maxEnt classifier[EB/OL].http://nlp.stanford.edu/courses/ca224n/20OS/agarwal_hemdon_shneider_final.pdf.
  • 6Berger A,Della Pietra S A,Della Pietra V J.A maximum entropy approach to natural language processing[J].Computational Linguistics,1996,22(1):39-71.
  • 7黄成哲,张晓光,李向宏,王丁.英文句子边界自动识别[J].微处理机,2003,24(1):30-34. 被引量:7
  • 8Darroch J,Ratcli D.Generlized iterative scaling for lnglinear models[J].Annals of Mathematical Statistics,1972,43(5):1470-1480.
  • 9Tomur H.Modern Uighur grammar(in Uyghur)[M].Beijing:National Publishing House,1987.

二级参考文献10

  • 1[4]David D. Plamer, 1995. Experiments in Multilingual Sentence Boundary Recognition; Proc. of Recent Advances In Natural Language Processing, Bulgaria,1995
  • 2[5]Andrei Mikheev, 1994. Periods, Capitalized Words etc.Computational Linguistics, 9884 (Vo116: No. 1)
  • 3[6]Andrei Mikheev, 1999. A Knowledge-free Method for Capitalized Word Disambiguation,Proc. of 37th Annual Meeting of the ACL, 1999
  • 4[2]Humphrey, T. , and Zhou, F. Period Disambiguation Using a Neural Network. In IJCNN : International Joint Conference on Neural Networks ,1989 : 606
  • 5[3]Palmer, D. D., and Hearst, M. A. 1994 Adaptive Sentence Boundary Disambiguation. UC Berkeley Computer Science Technical Report Number UCB/CSD -94-797. Also CL,1997
  • 6[1]Riley, M. D. Some Application of Tree- Modeling to Speech and Language Indexing. In Proceedings of the DARPA Speech and Natural Language Workshop,1989:339~352.
  • 7Palmer D D, Hearst M A. Adaptive Multilingual Sentence Boundary Disambiguation[J].Computation Linguistics, 1977,23(2),241-269.
  • 8Mikheev A. Tagging Sentence Boundaries[M]. In Proceedings of the NAACL, Seattle, WA. 2000, 264-271.
  • 9Mark Stevenson, Robert Gaizauskas. Experiments on Sentence Boundary Detection[M].In Proceedings of the ANLP, 2000,84-89.
  • 10黄成哲,张晓光,李向宏,王丁.英文句子边界自动识别[J].微处理机,2003,24(1):30-34. 被引量:7

共引文献13

同被引文献67

引证文献7

二级引证文献25

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部