摘要
为了改善基于关键词的垂直搜索引擎查全率低和相关排序效果不佳的缺点,提出了基于本体的垂直搜索引擎分类索引模型。设计了一种基于领域本体的分类体系,实现了基于该分类体系的细粒度文本分类,并将分类信息写入索引,增加了索引的语义信息。在lucene原有索引结构的基础上,重新设计了索引的逻辑结构和物理结构,使类别信息和关键词信息合理融合,形成分类索引。最后提出针对该索引的检索算法,并举例验证了该模型的有效性。
In order to improve the low recall rate of the
出处
《计算机工程与设计》
CSCD
北大核心
2010年第23期4999-5003,5011,共6页
Computer Engineering and Design
基金
国家自然科学基金项目(60972090)
辽宁省自然科学基金项目(20072142)
大连市政府IT优秀教师基金项目(大信发2008-40-6)
关键词
本体
文本分类
垂直搜索引擎
分类体系
分类索引
-based vertical search engine and poor relevance ranking
an ontologybased classification-indexing model is proposed.A classification systemis designed based on domain ontology
which implements finegrained text classification
classification of information is written into the index
that is
semantic information is added into the index. Based on the original lucene index structure
the logical structure and physical structure of index are re-designed so that category information and keywords are reasonably formed an integrated classification index.In the end
a search algorithm on the basis of this index is proposed and examples are given to illustrate the rationality of the model. Key words:ontology
text classification
vertical search engine
classification system
classification-indexing