摘要
垂直搜索引擎是用户利用网络资源不可或缺的工具,而其中的索引引擎和查询引擎都使用到了分词算法。前者依据分词算法分析文本建立倒排索引表,后者依据分词算法将查询语句切分成关键词。为了提高垂直搜索引擎的检索效率,有必要研究一下垂直搜索引擎中的分词技术。在本文中,首先介绍了现有的中文分词算法并分析了其缺点,在此基础上构建了一种改进的分词算法。最后,本文通过实验证明了该改进算法的有效性。
The vertical search engine is an indispensable tool for users to use the network resource, both the indexing engine and the query engine in it have used the word segmentation algorithm.The indexing engine uses the segmentation algorithm to analyze the texts and then estabhsh the index table, and the query engine uses the word segmentation algorithm to cut the query sentances into keywords.In order to improve the retrieval efficiency of the vertical search engine, it is necessary to study the word segmentation technology in the vertical search engine.In this paper, I analyze the existing Chinese algorithms firstly and its disadvantages are analyzed, then I build an improved segmentation algorithm.Finally, this paper proves that the algorithm is effective by the experiment.
出处
《网络安全技术与应用》
2014年第4期15-16,18,共3页
Network Security Technology & Application
关键词
垂直搜索引擎
分词算法
检索效率
vertical search engine
word segmentation algorithm
retrieval efficiency