摘要
本文主要介绍我们参加863信息检索评测的情况。我们的信息检索系统采用基于语言模型的信息检索方法,将命名实体技术运用在查询向量的构造中,利用基于链接分析的PageRank算法计算文档的先验概率,并在自动查询中采用了相关反馈技术。文章还介绍了系统构成的软硬件环境及相关评测数据,给出了对待自动查询和人工查询的不同策略,并通过实验结果的比较,分析了中文信息检索中比较有效的方法,最后给出了系统存在的不足以厦今后改进的方向。
The paper mainly introduces our information system which participated in this year's 863 evaluation. The system adopts language model for IR, which incorporates named entities to construct queries, uses PageRank algorithm based on link analysis as prior probability of document, and adds relevance feedback to auto run. The paper introduces the system's software and hardware environments and the related evaluation data, shows different strategies to deal with manual run and auto run, and analyzes effective methods in Chinese information retrieval by comparing experimental results, finally gives the system' s shortage and the directions for improvement.
出处
《中文信息学报》
CSCD
北大核心
2006年第B03期78-82,共5页
Journal of Chinese Information Processing
基金
国家自然科学基金资助项目(60372016)
北京市自然科学基金资助项目(4052027)
关键词
信息检索
语言模型
命名实体识剐
information retrieval
language model
name entity recognition