摘要
本文提出了一种基于自然语言理解的搜索引擎模型。它的核心技术是基于自然语言理解的相关技术,包括从关键词、提问方式、提问重点三个层次对用户查询进行语义分析、特征向量提取及基于该思想建立了面向Web网页内容的特征库,提出返回文档排序的算法,基于Lucene全文索引工具包建立了搜索引擎,对库中已收入的特征词进行了查询测试,查准率为86.7%。实验表明,该模型基本实现了对查询短语的理解,对提高搜索引擎的查准率有显著的效果。
This article proposes a search engine model which is based on the natural language understanding. It includes a method to analyze users' quest ions in natural language from three layers, that is, keyword, quest ion type and question focus. The analysis consists of semantic analysis, feature extraction and semantic matching. And with this thought the feature base that faces to Web page content is built. In addition, this article proposes an algorithm of returning to the documents arrangement, it investigates implementing retrieval system based on the Lucene toolkit. The feature words, which are collected in the feature base, are tested, and the precision ratio is about 86. 7%. The test result indicates that the module can realize the semantic comprehension to query, and it has an evident effect to improve the precision of search engine
出处
《计算机科学》
CSCD
北大核心
2008年第6期152-154,共3页
Computer Science
关键词
自然语言处理
分词
语义分析
向量空间模型
Natural language process, Word segmentation, Semantic analysis, Vector space model