摘要
在Internet高速发展的信息时代,搜索引擎是人们获得有效信息的强有力手段之一。中文搜索引擎的重点在于中文关键信息提取,其中的难点就是中文自动分词。该文重点讨论中文自动分词算法。算法采用基于自动建立词库的最佳匹配方法来进行中文分词,同时采用基于改进型马尔可夫N元语言模型的统计处理方法来处理分词中出现的歧义问题,从而提高精度。
During the period of rapid progress of Information Infrastructure Superhighway, Search Engine is a powerfultool for deriving valuable information. The core of Chinese Search Engine is the key content extracting, and the bottleneck is Chinese Word Automatic Segmentation.This algorithm uses the optimum matching method which is based onautomatic building dictionary to perform the Chinese word automatic segmentation, then resolves ambiguity with meliorative Markoff statistics process in order to enhance precision.
出处
《计算机工程与应用》
CSCD
北大核心
2000年第8期80-82,84,共4页
Computer Engineering and Applications
关键词
搜索引擎
中文自动分词
算法
汉字信息处理
Search Engine, Chinese Word Automatic Segmentation, Matching, Markoff process