摘要
现代汉语语气词用法的自动识别采用语气词用法词典、语气词用法规则库、语气词用法语料库"三位一体"的方式。由于语料规模较大,真实文本中语气词的用法又各具特点,因此人工书写的规则库主观性较强且难以全面的覆盖各种用法。针对现代汉语10个常用的语气词,研究了基于错误驱动的规则自动改进算法。实验结果表明,这种方法使大部分常用语气词的用法识别准确率有了一定程度的提高。
Automatic recognition of modal particles in modem Chinese adopts the way of "trinity", which includes usage dictionary, usage rules-base and usage corpora of modal particles. Because of the large corpora size and the different usages' characteristics of the modal parti- cles in real text, the manual rules-base has stronger subjectivity and can hardly cover all the usages as well. In light of 10 common modal par- ticles in modem Chinese, we study an error driven-based automatic rules improvement method. Experimental results show that this method improves to a certain extent the accuracy rate of usage recognition in regard to most of the common Chinese modal particles.
出处
《计算机应用与软件》
CSCD
北大核心
2012年第12期73-76,共4页
Computer Applications and Software
基金
国家自然科学基金项目(60970083)
河南省科技创新人才杰出青年基金项目(104100510026)
河南省教育厅自然科学研究计划项目(2011A520019)
关键词
虚词知识库
汉语语气词
错误驱动
规则自动改进
Functional words knowledge base Chinese modal particles Error-driven Automatic rules improvement