摘要
针对信息检索领域特定类型实体的检索问题,在传统搜索引擎的基础上,提出一种基于多角度关联模型的实体检索方法,综合运用实体名识别(NER)、文本向量、关联规则等技术以及Wikipedia、Stanford NER等工具,并在TREC2010实体检索项目中进行评测。实验结果表明,与基于BM25和贝叶斯模型的检索方法相比,该方法的nDCG@R值平均提高11.49%和18.09%。
This paper proposes an entity search method based on multi-perspective association model for the problem of searching particular type of entities in information retrieval field.The method employs Named Entity Recognition(NER),text vector,association rules,etc,and traditional search engines as well as Wikipedia,Stanford NER etc.Experimental result on the large Web data collection provided show that,compared with BM25 and traditional Bayesian model,this method increases nDCG@R by 11.49% and 18.09% separately.
出处
《计算机工程》
CAS
CSCD
2013年第1期71-75,共5页
Computer Engineering
基金
国家"863"计划基金资助项目(2009AA01Z429)
关键词
文本挖掘
关联规则
实体检索
实体名识别
词频-逆文档频率
维基百科
搜索引擎
text mining
association rule
entity retrieval
Named Entity Recognition(NER)
Term Frequency Inverse Document Frequency(TF-IDF)
Wikipedia
search engine