期刊文献+

兼类词概率分布计量考察及语法搭配模式在中文信息处理中的应用

A Study of the Probability Distribution and Grammatical Collocation Patterns of Multi-Category Words in Chinese Information Processing
下载PDF
导出
摘要 在词性标注的过程中,汉语中兼类词的存在是影响词性标注准确率的主要原因。本研究以三部词典标注一致的78个形名兼类词为测试对象,基于规则和统计相结合的词性标注方法,将统计的兼类词分布概率与语法搭配规则结合起来,利用兼类词语法搭配模式构建规则库,对国家语委现代汉语通用平衡语料库标注的兼类词结果进行修正,准确率可以提高14.57%。 In the process of part-of-speech tagging, the existence of multi-category words in Chinese is the main reason that affects the accuracy of part-of-speech tagging. In this study, 78 adjective-noun multi-category words of the same part-of-speech tagging in the three dictionaries are the test objects. The part-of-speech tagging method based on the combination of rules and statistics combines the statistical distribution probability of multi-category words with grammatical collocation rules, and builds a rule database using the grammatical collocation mode of multi-category words. The rule database corrects the results of the multi-category words tagged by the modern Chinese corpus of State Language Commission, and the accuracy rate can be increased by 14.57%.
机构地区 鲁东大学文学院
出处 《现代语言学》 2021年第2期524-529,共6页 Modern Linguistics
  • 相关文献

参考文献2

二级参考文献11

共引文献21

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部