摘要
针对汉语分词的歧义切分问题,提出了一种基于语境分析的二元汉语分词器,通过词典匹配和二元切词的结合进行歧义消解,使用分词校正器从句子语法角度上检测分词结果的正确性.实验证明,改进的汉语分词器具有识别各种类型新词、消解各类歧义的能力;将其应用到文本信息过滤系统的文本分析模块,在强化系统语言分析能力的同时,系统的过滤精度也得到大幅度的提高.
For the segmentation ambiguous problems of Chinese word segmentation,a bigram method for Chinese word segmentation was proposed based on language analysis.The combination of dictionary matching and bigram segmentation can digest ambiguity,and check the correctness of segmentation in sentence grammar by segmentation correction.Experiments proved that the improved Chinese word segmentation has the capacity which recognizes new words in various domains and disambiguates ambiguity words;The improved Chinese word segmentation is applied to text analysis module in the text information filtering system,the language analysis capabilities and filtering accuracy of the system have been improved at the same time.
出处
《郑州轻工业学院学报(自然科学版)》
CAS
2010年第3期66-70,共5页
Journal of Zhengzhou University of Light Industry:Natural Science
基金
黑龙江省研究生创新科研项目(YJSCX2006-38HLJ)
关键词
汉语分词器
文本信息过滤
歧义消解
分词校正器
Chinese word segmentation
text information filtering
disambiguation
segmentation correction