期刊文献+

基于语境分析的汉语分词器在文本信息过滤系统中的应用

Application of Chinese word segmentation based on language analysis in text information filtering system
下载PDF
导出
摘要 针对汉语分词的歧义切分问题,提出了一种基于语境分析的二元汉语分词器,通过词典匹配和二元切词的结合进行歧义消解,使用分词校正器从句子语法角度上检测分词结果的正确性.实验证明,改进的汉语分词器具有识别各种类型新词、消解各类歧义的能力;将其应用到文本信息过滤系统的文本分析模块,在强化系统语言分析能力的同时,系统的过滤精度也得到大幅度的提高. For the segmentation ambiguous problems of Chinese word segmentation,a bigram method for Chinese word segmentation was proposed based on language analysis.The combination of dictionary matching and bigram segmentation can digest ambiguity,and check the correctness of segmentation in sentence grammar by segmentation correction.Experiments proved that the improved Chinese word segmentation has the capacity which recognizes new words in various domains and disambiguates ambiguity words;The improved Chinese word segmentation is applied to text analysis module in the text information filtering system,the language analysis capabilities and filtering accuracy of the system have been improved at the same time.
作者 律佳 廉立志
出处 《郑州轻工业学院学报(自然科学版)》 CAS 2010年第3期66-70,共5页 Journal of Zhengzhou University of Light Industry:Natural Science
基金 黑龙江省研究生创新科研项目(YJSCX2006-38HLJ)
关键词 汉语分词器 文本信息过滤 歧义消解 分词校正器 Chinese word segmentation text information filtering disambiguation segmentation correction
  • 相关文献

参考文献5

二级参考文献31

  • 1黄昌宁.中文信息处理中的分词问题[J].语言文字应用,1997(1):74-80. 被引量:83
  • 2孙茂松 左正平.汉语真实文本中的交集型切分歧[J].汉语计量与计算研究(Quantitative and Computational Studies on the Chinese Language).香港城市大学语言资讯科学研究中心,1998,:323-338.
  • 3卢开澄.计算机算法导引--设计与实现[M].清华大学出版社,1996..
  • 4宋柔.分词:汉语信息处理的基础工作[J].计算机世界,1997,:48-48.
  • 5白栓虎.汉语词切分及词性自动标注一体化方法[C]..计算语言学进展与应用(JSCL-95)[C].,1995.56-61.
  • 6Jian-yun Nie,Jianfeng Gao et al.On the Use of Words and N-grams for Chinese Information Retrieval[C].In :IRAL-2000,Fifth International Workshop on Information Retrieval with Asian Languages,Hong Kong, 2000-09.
  • 7Chris Buckley,Janet Walz et al.The Smart/Empire TIPSTER IR System[C].In:TIPSTER Phase III Proceedings, 1999:107-121.
  • 8Chris Buckley,James Allan,Gerard Salton.Automatic Routing and Adhoc Retrieval Using Smart[C].In:TREC2,TREC 2 Proceedings.
  • 9Gao Jianfeng,Joshua Goodman,Li Mingjing et al.Toward a unified approach to statistical language modeling for Chinese[J].ACM Transactions on Asia Language Information Processing,2001.
  • 10Masaaki Nagata. A stochastic Japanese morphological analyzer using a forward-DP backward- A * N-Best search algorithm [C]. In: Proceedings of COLING'94, 15th Int. Conference on Computational Linguistics, Kyoto, Japan, 1994, 201-207.

共引文献118

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部