基于层叠CRF模型的词结构分析被引量：7

Word Structure Analysis Based on Cascaded CRFs

下载PDF

导出

摘要传统的中文分词就是识别出每个词的边界,它忽略了汉语中词与短语分界不清这一特点。在理论上,语言学家对词边界的确定往往各持己见,各语料库的分词标准不能统一,在实践中也不能完全满足具体应用的需求。该文给出了基于层叠CRF模型的词结构自动分析方法,能够以较高的精确度获得词的边界信息和内部结构信息。相比于传统的分词,词的结构分析更加符合汉语词法与句法边界模糊的事实,解决了语料库标准的不一致性以及应用的不同需求。 Traditional research in Chinese word segmentation focuses on identifying word boundaries, without con- sidering the ambiguity of boundaries between Chinese words and phrases. In theory, linguists stick to their own view of word boundaries such that no uniform standard exists in Chinese word segmentation, and in practice, the corpus of various guidelines cannot bring satisfactory reusltsto wide applications. In this paper, we present a model based on cascaded CRF models to automatically parse internal structures of words, deciding both word boundaries and internal structures simultaneously with high precision. Compared with the traditional word segmentation meth- ods, analyzing the structure of words is more consistent with the fact of fuzzy boundaries between Chinese lexical and syntactic units, solving the problem of inconsistent corpus standards and meeting different application require- ments.

作者方艳周国栋

机构地区苏州大学自然语言处理实验室苏州大学计算机科学与技术学院

出处《中文信息学报》 CSCD 北大核心 2015年第4期1-7,24,共8页 Journal of Chinese Information Processing

基金自然科学基金青年项目(61202162) 教育部博士点基金新教师类课题(20123201120011)

关键词中文分词内部结构分词标准层叠CRF Chinese word segmentation internal structure annotation standard cascaded CRFs

分类号 TP391 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献13

1Hai Zhao. Character-level dependencies in Chinese: Usefulness and learning[C]//Proceedings of the 12th Conference of the European Chapter of the ACL(EA- CL 2009). 2009:879-887.
2Zhengdong Dong, Qiang Dong, Changling Hao. Word segmentation needs change-from a linguist's view[C]// Proceedings of CIPS-SIGHAN Joint Conference on Chinese Language Processing. 2010:1-7.
3Andi Wu. Customizable segmentation of morphologi- cally derived words in Chinese [C]//Computational Linguistics and Chinese language processing. 2003,8 (1) :1-27.
4Jianfeng Gao, Andi Wu, Mu Li Chang-Ning Huang, et al. Adaptive Chinese word segmentation[C]//Pro- cessings of the 42nd Annual Meeting on Association for Computational Linguistics. 2004.. 62-469.
5Wenbin Jiang, Liang Huang, Qun Liu. Automatic ad aptation of annotation standards: Chinese word seg mentation and POS tagging-a case study[C]//Proceed ings of the Joint Conference of the 47th Annual Meet ing of the ACL and the 4th International Joint Confer ence on Natural Language Processing of the AFNLP 2009: 522-530.
6孟凡东,徐金安,姜文斌,刘群.异种语料融合方法:基于统计的中文词法分析应用[J].中文信息学报,2012,26(2):3-7. 被引量：5
7Zhongguo Li. Parsing the Internal Structure of Words.. A new paradigm for Chinese word segmentation[C]// Proceedings of the 49th Annual Meeting of the Associ- ation of Computational Linguistics. 2011:1405-1414.
8Hai Zhao. Changning Huang, Mu Li. An improved Chinese word segmentation system with conditional random field[C]//Proceedings of the Fifth SIGHAN Workshop on Chinese Language Processing. 2006: 162-165.
9Yoshimasa Tsuruoka, Jun'ichi Tsujii, Sophia Ananiadou. Fast full parsing by linear chain condition- al random fields[C]//Proceedings of the 12th Confer- ence of the European Chapter of the ACI.. 2009:790- 798.
10S Abney, S Flieknger, C Gdaniec, et al. Procedure for quantitatively comparing the syntactic coverage of English grammars [C]//Proceedings of the workshop on Speech and Natural Language, Association for Computational Linguistics. 1991 : 306-311.

二级参考文献16

1骆正清,陈增武,胡上序.一种改进的MM分词方法的算法设计[J].中文信息学报,1996,10(3):30-36. 被引量：28
2Wenbin Jiang, Liang Huang, Qun Liu. Automatic Adaptation of Annotation Standards:Chinese Word Segmentation and POS Tagging-A Case Study.Association for Computational Linguistics[C]//Proceedings of the 47th Annual Meeting of the Association for Computational Linguistics. Suntec,Singapore:ACL Publication Chairs,2009:522-530.
3Hwee Tou Ng,Jin Kiat Low.Chinese part-of-speech tagging:One-at-a-time or all-at-once? word-based or character-based?[C]//Proceedings of Conference on Empirical Methods in Natural Language Processing.Barcelona,Spain:ENMLP Publication Chairs,2004.
4Wenbin Jiang,Liang Huang,Yajuan Lv,et al.A cascaded linear model for joint Chinese word segmentation and part-of-speech tagging[C]//Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics. Oho, USA: ACL Publication Chairs,2008:897-904.
5Wenbin Jiang,Haitao Mi,Qun Liu. Word Lattice Reranking for Chinese Word Segmentation and Part-ofSpeech Tagging[C]//Proceedings of the 22nd International Conference on Computational Linguistics.Manchester,England:COLING Publication Chairs,2008:385-392.
6Kun Wang,Chengqing Zong,Keh-Yih Su.A Character-Based Joint Model for Chinese Word Segmentation[C]//Proceedings of the 24th International Conference on Computational Linguistics.Beijing,China:COLING Publication Chairs,2010:1173-1181.
7Zhongguo Li,Maosong Sun.Punctuation as Implicit Annotations for Chinese Word Segmentation[J].Computational Linguistics. Proceedings of Computational Linguistics.2009,35(4):505-512.
8Yue Zhang,Stephen Clark.Chinese segmentation with a word-based perceptron algorithm[C]//Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics. Prague,Czech Republic:ACL Publication Chairs,2007:840-847.
9Nianwen Xue.Chinese word segmentation as character tagging[J]. International Journal of Computational Linguistics and Chinese Language Processing,2003,8(1):29-48.
10Huihsin Tseng,Pichuan Chang,Galen Andrew,et al.A conditional random field word segmenter for sighan bakeoff 2005[C]//Proceedings of the fourth SIGHAN workshop.2005:168-171.

共引文献46

1唐琳,郭崇慧,陈静锋.中文分词技术研究综述[J].数据分析与知识发现,2020,4(2):1-17. 被引量：43
2丁洁.基于Lucene的中文分词系统设计与实现[J].自动化与仪器仪表,2016(5):208-210. 被引量：5
3刘一佳,车万翔,刘挺,张梅山.基于序列标注的中文分词、词性标注模型比较分析[J].中文信息学报,2013,27(4):30-36. 被引量：12
4胥小波,赵尔凡,康荣保.基于语义分析的互联网人物信息提取[J].信息安全与通信保密,2013,11(12):103-108. 被引量：3
5白涛,张太红,吴乃宁.基于词典和全切分的中文农业网页分词算法的研究[J].新疆农业大学学报,2014,37(2):168-172. 被引量：1
6张杰,张海超,翟东升.面向中文专利权利要求书的分词方法研究[J].现代图书情报技术,2014(9):91-98. 被引量：9
7梁喜涛,顾磊.中文分词与词性标注研究[J].计算机技术与发展,2015,25(2):175-180. 被引量：48
8高恩婷,巢佳媛,李正华.面向词性标注的多资源转化研究[J].北京大学学报（自然科学版）,2015,51(2):328-334.
9刘泽文,丁冬,李春文.基于条件随机场的中文短文本分词方法[J].清华大学学报（自然科学版）,2015,55(8):906-910. 被引量：17
10韩冰,刘一佳,车万翔,刘挺.基于感知器的中文分词增量训练方法研究[J].中文信息学报,2015,29(5):49-54. 被引量：3

同被引文献61

1苑春法,黄昌宁.基于语素数据库的汉语语素及构词研究[J].世界汉语教学,1998,12(2):8-13. 被引量：88
2徐通锵.核心字和汉语的语义构辞法[J].语文研究,1997(3):2-16. 被引量：29
3张国宪.并列式合成词的语义构词原则与中国传统文化[J].汉语学习,1992(5):28-31. 被引量：7
4王德春,张辉.认知语言学研究现状[J].外语研究,2001,18(3):1-10. 被引量：83
5黄昌宁,赵海.中文分词十年回顾[J].中文信息学报,2007,21(3):8-19. 被引量：248
6赵海,揭春雨.基于有效子串标注的中文分词[J].中文信息学报,2007,21(5):8-13. 被引量：26
7Wu A D.Customizable segmentation of morphological ly derived words in Chinese[J].International Journal of Computational Linguistics and Chinese Language Processing,2003,8(1):1-27.
8Zhao H.Character-level dependencies in Chinese:Usefulness and learning[C]//Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics.Association for Computational Linguistics,2009:879-887.
9Zhang M S,Zhang Y,Che W,et al.Chinese parsing exploiting characters[C]//Proceedings of 51st Annual Meeting of the Association for Computational Linguistics.2013.
10Li Z G.Parsing the internal structure of words:a new paradigm for Chinese word segmentation[C]//Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics:Human Language Technologies-Volume 1.Association for Computational Linguistics,2011:1405-1414.

引证文献7

1孙静,方艳,丁彬,周国栋.利用扩展标记集的词结构分析[J].中文信息学报,2014,28(5):39-45. 被引量：2
2蒋万伟,刘娟.基于条件随机场的词结构分析方法[J].武汉大学学报（理学版）,2017,63(3):251-258. 被引量：3
3金宸,李维华,姬晨,金绪泽,郭延哺.基于双向LSTM神经网络模型的中文分词[J].中文信息学报,2018,32(2):29-37. 被引量：38
4赵明,董翠翠,董乔雪,陈瑛.基于BIGRU的番茄病虫害问答系统问句分类研究[J].农业机械学报,2018,49(5):271-276. 被引量：23
5李成华,孙雅婧,张世娟,艾提日也古丽·艾尼瓦尔.基于CRF模型的维吾尔语分词研究[J].中南民族大学学报（自然科学版）,2019,38(4):596-604.
6秦博宇,郝晓燕,刘永芳.基于SVM和CRF双层模型的FrameNet框架消歧[J].计算机工程与应用,2021,57(18):255-262.
7郑婳,刘扬,殷雅琦,王悦,代达劢.基于词信息嵌入的汉语构词结构识别研究[J].中文信息学报,2022,36(5):31-40. 被引量：1

二级引证文献66

1李林,刁磊,唐詹,柏召,周晗,郭旭超.基于BERT_Stacked LSTM的农业病虫害问句分类方法[J].农业机械学报,2021,52(S01):172-177. 被引量：4
2张博凯,李想.基于知识图谱的Android端农技智能问答系统研究[J].农业机械学报,2021,52(S01):164-171. 被引量：9
3张海瑜,陈庆龙,张斯静,张子怡,杨帆,李鑫星.基于语义知识图谱的农业知识智能检索方法[J].农业机械学报,2021,52(S01):156-163. 被引量：11
4程宁,李斌,葛四嘉,郝星月,冯敏萱.基于BiLSTM-CRF的古汉语自动断句与词法分析一体化研究[J].中文信息学报,2020(4):1-9. 被引量：16
5王安平,姚杰,曹林,苏维娜.石英位错的TEM衍衬象及其在地学中的应用[J].长春科技大学学报,2000,30(2):131-133.
6蒋万伟,刘娟.基于条件随机场的词结构分析方法[J].武汉大学学报（理学版）,2017,63(3):251-258. 被引量：3
7董虎胜.基于长短时记忆网络的古诗词生成[J].现代计算机（中旬刊）,2018(11):18-21.
8周海华,曹春萍.基于BLSTM-CRF的领域知识点实体识别技术[J].软件,2019,40(2):1-5.
9石文浩,孟军,张朋,刘婵娟.融合CNN和Bi-LSTM的miRNA-lncRNA互作关系预测模型[J].计算机研究与发展,2019,56(8):1652-1660. 被引量：8
10程博,李卫红,童昊昕.基于BiLSTM-CRF的中文层级地址分词[J].地球信息科学学报,2019,21(8):1143-1151. 被引量：14

1刘奕.浅析高校数字化校园建设[J].管理学家（学术版）,2014(2).
2黄昌宁.中文信息处理中的分词问题[J].语言文字应用,1997(1):74-80. 被引量：83
3余亚军,刘泽燊,潘志松,胡谷雨,常青.基于图核的网络攻击图分析[J].军事通信技术,2016,37(3):20-25.
4范仁龙.浅谈数字化校园建设[J].消费电子,2014,0(20):288-288.
5付国宏,王晓龙.面向真实文本的汉语词法自动分析系统[J].高技术通讯,1999,9(12):6-10.
6李斌,陈小荷.面向中文陌生文本的人机交互式分词方法[J].中文信息学报,2007,21(3):92-98.
7闫金平,来珠.软件复杂度的自动分析方法[J].天津大学学报,1995,28(1):83-88.
8王小波.新型虚拟货币——比特币的未雨绸缪[J].电子商务,2013,14(12):14-15. 被引量：2
9肖迪,王灿,何景熙.基于CAD软件的装配尺寸链自动分析[J].工程设计学报,2000,7(4):36-38.
10赵榴明.大学数字化校园建设的思考[J].周口师范学院学报,2004,21(5):92-93. 被引量：5

中文信息学报

2015年第4期

浏览历史

内容加载中请稍等...

基于层叠CRF模型的词结构分析被引量：7

参考文献13

二级参考文献16

共引文献46

同被引文献61

引证文献7

二级引证文献66

相关作者

相关机构

相关主题

浏览历史

基于层叠CRF模型的词结构分析 被引量：7

参考文献13

二级参考文献16

共引文献46

同被引文献61

引证文献7

二级引证文献66

相关作者

相关机构

相关主题

浏览历史

基于层叠CRF模型的词结构分析被引量：7