基于混合策略的英语基本名词短语识别——边界统计和词性串规则校正相结合的策略被引量：2

English Base Noun Phrase Identification Based on Hybrid Strategy-- The Strategy of Combination of Boundary Statistic and the Amendment of the String of Part of Speech

下载PDF

导出

摘要基本名词短语识别是自然语言处理领域非常重要的子任务。文中总结了一些有代表性的基本名词短语识别方法,并对多种典型英语基本名词短语识别的结果进行了比较和对照,提出并实现了边界统计和词性串校正相结合的英语基本名词短语识别方法。该方法把基本名词短语识别分成主次分明的两部分,边界统计作为主要部分能够正确识别出大部分基本名词短语,词性串规则作为辅助手段在对前者识别出的基本名词短语进行核对和校正的同时还对边界统计方法遗漏的基本名词短语进行再回收。此方法中,词性串规则弥补了边界统计无法顾及基本名词短语内部组合规律的缺点,提高了精确率和召回率。采用此方法,基本名词短语识别的精确率达到96.22%,召回率97.59%,Fβ=196.90%,F值超出了目前报道的最好结果。 Base noun phrase identification is an important sub -task in natural language processing.Representative methods of base noun phrase identification are summarized in this paper,whose results are compared and analyzed.A novel method of base noun phrase identification is proposed which combines boundary statistic and the amendment by the string of part of speech.The method divides the base noun phrase identification task into two parts.As the primary part,boundary statistic method can correctly identify most of the base noun phrases.The rules serve as the secondary part,which is composed of a string of part of speech tags.The rules make amendments to the base noun phrase identified by the primary part,at the same time recycle the base noun phrases which are neglected by the primary part,thus enhancing both the precision and recall.The secondary part of the method remedies the primary part by taking into account the interior constitution of base noun phrase.The method reaches a precision of96.22%and recall of97.59%in English base noun phrase identification,whose F β=1 reaches96.90%.Compared to other method the method achieves the highest F score.

作者梁颖红赵铁军姚建民于浩徐冰

机构地区东北林业大学信息与计算机工程学院哈尔滨工业大学计算机科学与技术学院

出处《计算机工程与应用》 CSCD 北大核心 2004年第35期1-3,121,共4页 Computer Engineering and Applications

基金国家自然科学基金(编号:60302021 60375019) 国家863高技术研究发展计划项目(子课题)(编号:2002AA117010-09) 科技部政府间国际合作项目(编号:CI-2003-03)资助

关键词基本名词短语识别英语混合策略语块边界统计词性串规则校正 base noun phrase,chunk,boundary statistic,bunches of part of speech

分类号 TP301 [自动化与计算机技术—计算机系统结构] H313 [语言文字—英语]

引文网络
相关文献

参考文献6

1Church. A stochastic parts program and noun phrase parser for unrestricted text[C].In:Proceedings of the Second Conference on Applied Natural Language Processing, 1988:136～143
2Ramshaw,Marcus.Text chunking using transformation-based learning.In Natural language processing using very large corpora,Kluwer,Originally appeared in WVLC-95,1995:82～94
3Cardie Claire,Pierce David.Error-driven pruning of treebank grammars for base noun phrase identification[C].In:Proceedings of COLINGACL'98,1998:218～224
4赵军,黄昌宁.基于转换的汉语基本名词短语识别模型[J].中文信息学报,1999,13(2):1-7. 被引量：41
5Xun EnDong. An Unified Statistical Model to Identify English Base NP[C].In:ACL-2000,The 38th Annual Meeting of the Computational Linguistics, HongKong, 2000-10
6Guo Dong Zhou,Jian Su.Error-driven HMM-based Chunk Tagger with Context-dependent Lexicon. EMNLP/VLC-2000,Hong Kong,2000-10

二级参考文献3

1张卫国.三种定语、三类意义及三个槽位[J].中国人民大学学报,1996,(4):97-100.
2张卫国，中国人民大学学报，1996年，4期，97页
3梅家驹，同义词词林，1983年

共引文献40

1李荣.基于隐马尔可夫模型的汉语非嵌套名词短语识别[J].忻州师范学院学报,2004,20(5):122-124. 被引量：1
2陈晓明,周渝.汉语部分句法分析的研究和发展趋势[J].贵州大学学报（自然科学版）,2004,21(4):384-386. 被引量：2
3孙宏林,俞士汶.浅层句法分析方法概述[J].当代语言学,2000,2(2):74-83. 被引量：38
4张锋,许云,侯艳,樊孝忠.基于互信息的中文术语抽取系统[J].计算机应用研究,2005,22(5):72-73. 被引量：36
5薛永增,杨沐昀,赵铁军,韩习武,齐浩亮.面向体育领域的句子主干翻译技术研究[J].中文信息学报,2005,19(5):24-30. 被引量：1
6华沙宝,达胡白乙拉.对蒙古语语料库基本名词短语的定界与统计分析[J].中文信息学报,2005,19(5):52-58. 被引量：4
7刘向华.对隐马尔科夫模型中动态编程的探讨[J].电脑学习,2005(6):2-4. 被引量：1
8钱小飞.“地”字结构识别[J].现代语文（下旬．语言研究）,2006(5):61-63. 被引量：2
9黄德根,王莹莹.基于SVM的组块识别及其错误驱动学习方法[J].中文信息学报,2006,20(6):17-24. 被引量：6
10奚建清,罗强.基于HMM的汉语介词短语自动识别研究[J].计算机工程,2007,33(3):172-173. 被引量：9

同被引文献15

1梁颖红,赵铁军,岳琪.英语基本名词短语识别技术研究[J].信息技术,2004,28(12):22-24. 被引量：4
2华沙宝,达胡白乙拉.对蒙古语语料库基本名词短语的定界与统计分析[J].中文信息学报,2005,19(5):52-58. 被引量：4
3吕琳,刘玉树.最大熵和Brill方法结合识别英语BaseNPs[J].北京理工大学学报,2006,26(6):500-503. 被引量：6
4ABNEY S.Parsing by chunks[M] //BERWICK P,ABNEY S,TENNY C.Principle-based parsing.Dordrecht:Kluwer Academic Publishers,1991:257-278.
5CHRUCH K W.A stochastic parts program and noun phrase for unrestricted test[C] //Proc of the 2nd Conference on Applied Natural Language Processing.Morristown,NJ:Association for Computational Linguistics,1998:136-143.
6周强.汉语短语的自动划分和标注[J].中文信息学报,1997,11(1):1-10. 被引量：21
7RAMSHAW I. A, MARCUS M P. Text chunking using transfor marion-based learning: proceedings of WVLC 95 [C]. Hongkong: Hongkong Polytechnic University, 1995.
8CLAIRE C, PIERCE D. Error-driven pruning of treebank gram mars for base noun phrase identification: proceedings of COLING ACL'98[C]. New York: Cornell University, 1998.
9CHURCH K. A stochastic parts program and noun phrase parser for unrestricted text[C], proceedings of the second Conference on Applied Natural Language Processing, 1988,1988.
10CLAIRE C, PIERCE D. The role of lexicalization and pruning for base noun phrase grammars[C]. Proceedings of the Sixteenth Na- tional Conference on Artificial Intelligence, 1999.

引证文献2

1孙瑞娜,古丽拉.阿东别克.基于规则的哈萨克语基本名词短语识别研究[J].计算机应用研究,2010,27(12):4511-4513. 被引量：4
2韩朝阳,刘国兵,王跃武.一种新型英语基本名词短语识别方法——基于边界概率与N_Gram词性串规则相结合[J].软件导刊,2015,14(8):14-18. 被引量：1

二级引证文献5

1汪泱,古丽拉.阿东别克,户冰心,牛宁宁.基于条件随机场的哈萨克语基本短语自动识别[J].计算机工程与设计,2014,35(10):3602-3607. 被引量：3
2古丽扎达·海沙,古丽拉·阿东别克.哈萨克语动词短语自动识别研究与实现[J].计算机工程与应用,2015,51(2):218-223. 被引量：3
3孙瑞娜.基于CRFs的哈萨克语名词短语自动获取[J].图书馆理论与实践,2015(8):101-105. 被引量：1
4孙晓杰.基于N-gram模型的哈萨克语语音识别及处理技术研究[J].信息记录材料,2018,19(9):97-99. 被引量：1
5周志浩,李建波.基于语言特征和复合测量的农业术语自动抽取研究[J].科学技术与工程,2022,22(24):10625-10630. 被引量：2

1梁颖红,毛蕾,赵铁军,徐冰,朱义勇.英语基本名词短语识别向汉语的快速移植[J].高技术通讯,2004,14(12):21-24. 被引量：1
2梁颖红,赵铁军,岳琪.英语基本名词短语识别技术研究[J].信息技术,2004,28(12):22-24. 被引量：4
3周雅倩,郭以昆,黄萱菁,吴立德.基于最大熵方法的中英文基本名词短语识别[J].计算机研究与发展,2003,40(3):440-446. 被引量：61
4郭永辉,杨红卫,马芳,王炳锡.基于粗糙集的基本名词短语识别[J].中文信息学报,2006,20(3):14-21. 被引量：2
5小雨.指定程序我要显示最前端[J].网友世界,2009(2):51-51.
6韩朝阳,刘国兵,王跃武.一种新型英语基本名词短语识别方法——基于边界概率与N_Gram词性串规则相结合[J].软件导刊,2015,14(8):14-18. 被引量：1
7谭魏璇,孔芳,倪吉,周国栋.基于混合统计模型的中文基本名词短语识别[J].计算机应用与软件,2011,28(8):254-256. 被引量：3
8孙瑞娜,刘茜.基于互信息的汉语基本名词短语自动识别[J].信息与电脑（理论版）,2012(11):71-72.
9chic,花的神明.程序启动,主次分明真有一套[J].电脑爱好者,2009(5):36-36.
10马小强,唐念尧.从被动语态谈科技英语[J].社科纵横,2007,22(12):185-186.

计算机工程与应用

2004年第35期

浏览历史

内容加载中请稍等...

基于混合策略的英语基本名词短语识别——边界统计和词性串规则校正相结合的策略被引量：2

参考文献6

二级参考文献3

共引文献40

同被引文献15

引证文献2

二级引证文献5

相关作者

相关机构

相关主题

浏览历史

基于混合策略的英语基本名词短语识别——边界统计和词性串规则校正相结合的策略 被引量：2

参考文献6

二级参考文献3

共引文献40

同被引文献15

引证文献2

二级引证文献5

相关作者

相关机构

相关主题

浏览历史

基于混合策略的英语基本名词短语识别——边界统计和词性串规则校正相结合的策略被引量：2