期刊文献+

自适应信息过滤中使用少量正例进行阈值优化(英文) 被引量:6

Threshold Optimization with a Small Number of Samples in Adaptive Information Filtering
下载PDF
导出
摘要 自适应信息过滤中一个大的挑战在于其数据稀疏问题.因此,在对输入的文本流进行过滤的同时学习最优阈值非常重要.提出了一种新颖的阈值优化算法.该算法可以通过少量的正例进行快速的学习,所需数据的获得具有增量性,故而其计算量及所需的存储空间很小.此外,该算法还具有高效、健壮、实用性强等优点.在第10届国际文本检索会议(TREC10)上,复旦大学的自适应信息过滤系统使用了该阈值优化算法,并取得了第3名的成绩.其T10U和T10F分别达到了0.215和0.414. One special challenge in adaptive information filtering is the problem of extremely sparse data. So it is very important to learn optimal threshold while filtering the input textual stream. In this paper, an algorithm is presented for the threshold optimization. The algorithm learns fast by using few positive samples. Moreover, most of the quantities the algorithm requires can be updated incrementally, so its memory and computational power requirements are low. It also has the merits of effective, robust, and practically useful. Fudan University's adaptive text filtering system used this algorithm for the first time and came in third in all runs of TREC10. Its T10U and T10F are 0.215 and 0.414 respectively.
出处 《软件学报》 EI CSCD 北大核心 2003年第10期1697-1705,共9页 Journal of Software
基金 国家自然科学基金 国家高技术研究发展计划(863)~~
关键词 自适应信息过滤 向量空间模型 阈值优化 检出率 相关反馈 Adaptive systems Computer software Data processing Learning algorithms Optimization
  • 相关文献

参考文献14

  • 1Salton G. Develovments in automatic text retrieval. Science, 1991,253:974-979
  • 2Zhai C, Jansen P,Roma N, Stoica E, Evans DA. Optimization in CLARIT adaptive filtering. In:Voorhees EM, Harman DK, eds.Proceedings of the 8th Text Retrieval Conference. 1999.253-258.
  • 3Zhang Y, Callan J. Yfilter at TREC9. In: Voorhees EM, Harman DK, eds, Proceedings of the 9th Text Retrieval Conference.Gaithersburg. 2000. 154-161.
  • 4Allan J. Incremental relevance feedback for information filtering. In:Frei HP, Harman D, Schiuble P, Wilkinson R, eds.Proceedings of the 19th annual international ACM SIGIR conference on Research and Development in Information Retrieval 1996.Zurich, Switzerland. 1996. 270-278.
  • 5Arampatzis A, Beney J, Koster CHA, van der Weide TP. KUN on the TREC9 filtering track: Incrementality, decay, and theshold optimization for adaptive filtering systems. In:Voorhees EM, Harman DK, eds. Proceedings of the 9th Text Retrieval Conference.Gaithersburg, 2000. 87-109.
  • 6Bucldey C, Salton G, Allan J. The effect of adding relevance information in a relevance feedback enviroment.ln: Croft WB, van Rijsbergen CJ, eds. Proceedings of the 17th Annual International ACM-SIGIR Conference on Research md Development in Information Retrieval. Dublin, ACM/Springer, 1994. 292-300.
  • 7Voorhees EM, et al. Overview of TREC 2001. In: Voorhees EM, Harman DK, eds. Proceedings of the 9th Text Retrieval Conference. Gaithersburg, 2001. 1 - 12.
  • 8Sebastiani F. Macrame learning in automated text categorization, ACM Computing Surveys, 2002,34(1): 1--47.
  • 9Wu LD, et al. FDU at TREC--9: CLIR, filtering and QA tasks. In: Voorhees EM, Harman DK, eds. Proceedings of the 9th Text Retrieval Conference. Galthersburg, 2000. 202-219.
  • 10Robertson SE, Walker S. Microsoft cambridge at TREC9: Filtering track. In:Voorhees EM, Harman DK, eds. Proceedings of the 9th Text Retrieval Conference. Gaithersburg, 2001. 117-131.

同被引文献41

引证文献6

二级引证文献6

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部