摘要
文本检索会议(TextREtrievalConference,TREC)是目前国际上信息检索领域最重要的学术交流与系统评测活动。会议为参加者提供标准的数据集合、评测问题和标准答案,从而使参加者以共同的标准进行系统运行和评测。作者代表中国科学院参加了文本检索会议的WEB信息检索任务。在TREC2002中,作者发现了适合不同数据集合的较高性能的内容检索算法,并综合考虑了文本内容、链接文字、文档结构等因素对WEB信息检索效果的影响,取得了较好的成绩。该方法在两届会议的不同任务中均表现了较高的性能。
The Text REtrieval Conference(TREC)is the most important academic interaction and system evaluation fo-rum in the information retrieval community.TREC provides standard data collection,topics and relevance judgments for its participants so that they can conduct their retrieval research in a common manner.We took part in Web Track of TREC in2002.We have built an effective information retrieval system which can deal with large amounts of data while showing satisfactory performance on different test collections.We make use of relevance information from other aspects such as anchor texts and document structure as well as the relevance score from traditional IR system.Our approach has shown good performance in both of the Web Track tasks.
出处
《计算机工程与应用》
CSCD
北大核心
2003年第26期37-39,80,共4页
Computer Engineering and Applications
基金
国家重点基础研究发展规划973资助项目(编号:G1998030413
G1998030510)
计算所领域前沿青年基金(编号:20026180-24)