利用扩展标记集的词结构分析被引量：2

A Word Structure Analysis by Extending the Word Tag Set

下载PDF

导出

摘要该文给出了一种与传统分词不同的词法分析选择,提出了一种利用扩展标记集来实现词内部结构分析的方法。首先阐述了词的内部结构特点,把结构中的前后缀视为特殊的词,进而通过识别出每一个词的前后缀来识别词的内部结构。方法是把词内部结构识别问题转换成序列标注问题,通过扩展标记集,采用CRF模型来实现词的内部结构分析。最终实验表明,无论是在总体性能上,还是在各层结构的识别上都取得了较高的准确度。 This paper proposes a different way of lexical analysis, to analyze the internal structures of words, and presents a word structure analysis method by extending the word tag set. First, we describe the characteristics of the internal structures of words, By treating the prefixes and suffixes within words structures as special words, we identify the internal structures of words through the detection of prefixes and suffixes. We convert the issue of iden- tifying the internal structures of words into the sequence tagging problem, adopting the CRF model to realize the words structures analysis using extending the word tag set. The experiment shows that they achieve higher accuracy both on overall performance and on the identification of each layer structure.

作者孙静方艳丁彬周国栋

机构地区苏州大学计算机科学与技术学院

出处《中文信息学报》 CSCD 北大核心 2014年第5期39-45,82,共8页 Journal of Chinese Information Processing

基金国家自然科学基金青年项目(61202162) 教育部博士点基金新教师类课题(20123201120011) 国家863计划前沿技术研究类项目(2012AA011102)

关键词扩展标记集词结构分析前后缀序列标注问题 extend the word tag set words structures analysis prefixes and suffixes sequence tagging problem

分类号 TP391 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献9

1Wu A D.Customizable segmentation of morphological ly derived words in Chinese[J].International Journal of Computational Linguistics and Chinese Language Processing,2003,8(1):1-27.
2Zhao H.Character-level dependencies in Chinese:Usefulness and learning[C]//Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics.Association for Computational Linguistics,2009:879-887.
3Zhang M S,Zhang Y,Che W,et al.Chinese parsing exploiting characters[C]//Proceedings of 51st Annual Meeting of the Association for Computational Linguistics.2013.
4Li Z G.Parsing the internal structure of words:a new paradigm for Chinese word segmentation[C]//Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics:Human Language Technologies-Volume 1.Association for Computational Linguistics,2011:1405-1414.
5方艳,周国栋.基于层叠CRF模型的词结构分析[J].中文信息学报,2015,29(4):1-7. 被引量：7
6Li S,Huang C.Word Boundary Decision with CRF for Chinese Word Segmentation[C]//Proceedings of PACLIC-2009.2009:726-732.
7Peng F,Feng F,McCallum A.Chinese segmentation and new word detection using conditional random fields[C]//Proceedings of COLING.2004.
8Xue N W,Converse S P.Combining classifiers for Chinese word segmentation[C]//Proceedings of the first SIGHAN workshop on Chinese language processing-Volume 18.Association for Computational Linguistics,2002:1-7.
9Zhao H,Huang C N,Li M.An improved Chinese word segmentation system with conditional random field[C]//Proceedings of the Fifth SIGHAN Workshop on Chinese Language Processing.Sydney:July,2006:108-117.

二级参考文献13

1Hai Zhao. Character-level dependencies in Chinese: Usefulness and learning[C]//Proceedings of the 12th Conference of the European Chapter of the ACL(EA- CL 2009). 2009:879-887.
2Zhengdong Dong, Qiang Dong, Changling Hao. Word segmentation needs change-from a linguist's view[C]// Proceedings of CIPS-SIGHAN Joint Conference on Chinese Language Processing. 2010:1-7.
3Andi Wu. Customizable segmentation of morphologi- cally derived words in Chinese [C]//Computational Linguistics and Chinese language processing. 2003,8 (1) :1-27.
4Jianfeng Gao, Andi Wu, Mu Li Chang-Ning Huang, et al. Adaptive Chinese word segmentation[C]//Pro- cessings of the 42nd Annual Meeting on Association for Computational Linguistics. 2004.. 62-469.
5Wenbin Jiang, Liang Huang, Qun Liu. Automatic ad aptation of annotation standards: Chinese word seg mentation and POS tagging-a case study[C]//Proceed ings of the Joint Conference of the 47th Annual Meet ing of the ACL and the 4th International Joint Confer ence on Natural Language Processing of the AFNLP 2009: 522-530.
6Zhongguo Li. Parsing the Internal Structure of Words.. A new paradigm for Chinese word segmentation[C]// Proceedings of the 49th Annual Meeting of the Associ- ation of Computational Linguistics. 2011:1405-1414.
7Hai Zhao. Changning Huang, Mu Li. An improved Chinese word segmentation system with conditional random field[C]//Proceedings of the Fifth SIGHAN Workshop on Chinese Language Processing. 2006: 162-165.
8Yoshimasa Tsuruoka, Jun'ichi Tsujii, Sophia Ananiadou. Fast full parsing by linear chain condition- al random fields[C]//Proceedings of the 12th Confer- ence of the European Chapter of the ACI.. 2009:790- 798.
9S Abney, S Flieknger, C Gdaniec, et al. Procedure for quantitatively comparing the syntactic coverage of English grammars [C]//Proceedings of the workshop on Speech and Natural Language, Association for Computational Linguistics. 1991 : 306-311.
10Meishan Zhang, Yue Zhang, Wanxiang Che, et al. Chinese Parsing Exploiting Characters [C]//Proceed- ings of the 51st Annual Meeting of the Association for Computational Linguistics. 2013 : 125-134.

共引文献6

1蒋万伟,刘娟.基于条件随机场的词结构分析方法[J].武汉大学学报（理学版）,2017,63(3):251-258. 被引量：4
2金宸,李维华,姬晨,金绪泽,郭延哺.基于双向LSTM神经网络模型的中文分词[J].中文信息学报,2018,32(2):29-37. 被引量：39
3赵明,董翠翠,董乔雪,陈瑛.基于BIGRU的番茄病虫害问答系统问句分类研究[J].农业机械学报,2018,49(5):271-276. 被引量：24
4李成华,孙雅婧,张世娟,艾提日也古丽·艾尼瓦尔.基于CRF模型的维吾尔语分词研究[J].中南民族大学学报（自然科学版）,2019,38(4):596-604.
5秦博宇,郝晓燕,刘永芳.基于SVM和CRF双层模型的FrameNet框架消歧[J].计算机工程与应用,2021,57(18):255-262.
6郑婳,刘扬,殷雅琦,王悦,代达劢.基于词信息嵌入的汉语构词结构识别研究[J].中文信息学报,2022,36(5):31-40. 被引量：1

同被引文献16

1苑春法,黄昌宁.基于语素数据库的汉语语素及构词研究[J].世界汉语教学,1998,12(2):8-13. 被引量：89
2徐通锵.核心字和汉语的语义构辞法[J].语文研究,1997(3):2-16. 被引量：29
3张国宪.并列式合成词的语义构词原则与中国传统文化[J].汉语学习,1992(5):28-31. 被引量：7
4苏宝荣.词(语素)义与结构义[J].语文研究,2011(1):1-5. 被引量：12
5谭景春.词的意义、结构的意义与词典释义[J].中国语文,2000(1):69-78. 被引量：28
6孟凡东,徐金安,姜文斌,刘群.异种语料融合方法:基于统计的中文词法分析应用[J].中文信息学报,2012,26(2):3-7. 被引量：5
7张梅山,邓知龙,车万翔,刘挺.统计与词典相结合的领域自适应中文分词[J].中文信息学报,2012,26(2):8-12. 被引量：44
8王洪君.汉语语法的基本单位与研究策略[J].语言教学与研究,2000(2):10-18. 被引量：30
9方艳,周国栋.基于层叠CRF模型的词结构分析[J].中文信息学报,2015,29(4):1-7. 被引量：7
10吉志薇,冯敏萱.面向普通未登录词理解的二字词语义构词研究[J].中文信息学报,2015,29(5):63-68. 被引量：9

引证文献2

1蒋万伟,刘娟.基于条件随机场的词结构分析方法[J].武汉大学学报（理学版）,2017,63(3):251-258. 被引量：4
2郑婳,刘扬,殷雅琦,王悦,代达劢.基于词信息嵌入的汉语构词结构识别研究[J].中文信息学报,2022,36(5):31-40. 被引量：1

二级引证文献5

1高翔,张金登,许潇,冯剑红.基于LSTM-CRF的军事动向文本实体识别方法[J].指挥信息系统与技术,2020,11(6):91-95. 被引量：13
2张祺,李成军,刘敬蜀.基于BERT_IDCNN_CRF的军事领域命名实体识别研究[J].航天电子对抗,2021,37(5):56-60. 被引量：7
3郑婳,刘扬,殷雅琦,王悦,代达劢.基于词信息嵌入的汉语构词结构识别研究[J].中文信息学报,2022,36(5):31-40. 被引量：1
4才让叁智,多拉,格桑多吉,洛桑嘎登,仁增多杰.TASSM_BS:基于Bi-LSTM和Self-Attention的藏文自动分句方法[J].中文信息学报,2023,37(5):44-52. 被引量：1
5任伟建,计妍,康朝海.基于XLBIC的石油开采数据命名实体识别研究[J].计算机仿真,2024,41(6):390-395.

1王星,张检军,肖锋瑞.基于XML的数据处理技术浅析[J].今日财富（金融发展与监管）,2011,0(12):258-258.
2沈良忠.JSP中扩展标记的原理与工程应用[J].电脑知识与技术,2005(3):54-57. 被引量：1
3刘清林.dBASE数据文件内部结构分析及修复技术[J].新浪潮,1993(2):46-48.
4王亚弟,韩继红,朱玉娜,张超,赵娟,范钰丹.具有计算可靠性的符号模型[J].计算机工程,2009,35(13):144-146. 被引量：1
5吕素刚,郑洪源.基于扩展标记的改进本体概念分类算法[J].计算机工程,2011,37(15):43-45.
6金波,王行愚.采用扩展标记语言的知识表示方法[J].华东理工大学学报（自然科学版）,2000,26(1):74-76. 被引量：12
7侯孟书,赫俊民,胡卫东,杨帆,唐雪飞.XML与CORBA的集成[J].电子科技大学学报,2002,31(5):529-533. 被引量：3
8张敏.AutoCAD图形文件的分析：9.03版DWG文件的内部结构分析[J].化工电子计算,1992,19(1):1-9.
9张迪,弓正.基于Android的飞机无纸化维修工作单系统软件设计[J].实验室研究与探索,2015,34(3):90-95. 被引量：3
10陈镐缨.INTERRUPT的内部结构分析及其应用[J].新浪潮,1992(1):26-28.

中文信息学报

2014年第5期

浏览历史

内容加载中请稍等...

利用扩展标记集的词结构分析被引量：2

参考文献9

二级参考文献13

共引文献6

同被引文献16

引证文献2

二级引证文献5

相关作者

相关机构

相关主题

浏览历史

利用扩展标记集的词结构分析 被引量：2

参考文献9

二级参考文献13

共引文献6

同被引文献16

引证文献2

二级引证文献5

相关作者

相关机构

相关主题

浏览历史

利用扩展标记集的词结构分析被引量：2