摘要
本文对信息处理技术中各种分词方法进行了研究,针对目前分词方法无法识别网络中不断出现的新词,设计了一种新的基于统计的分词方法。该方法避开现有的分词方法中的复杂语法规则,无需词典的支持,很好地解决了新词不断出现的问题,而且分词速度快,具有重要的理论和实用价值。
This paper analyzes the various segmentation methods in the information processing technology. In view of the current segmentation methods in the network which do not recognize the new emerging words, we design a new sub-word method based on statistics. This method avoids complex grammar and rules, needs no enormous support from dictionaries, and resolves the problems brought by the new words. So we conclude that this method has better exactness and is very pragmatic and powerful in practical operations.
出处
《计算机工程与科学》
CSCD
北大核心
2010年第5期133-135,共3页
Computer Engineering & Science
基金
山西省高等学校科技开发项目(20091150)
运城学院项目(JC-2009009)
关键词
WEB
统计分词
词典
特征提取
web
statistical word segmentation
dictionary
feature selection