期刊文献+

基于HTML文档结构的向量空间模型的改进 被引量:10

Vector Space Model Based on HTML Document Structure
下载PDF
导出
摘要 根据HTML文档不同标签域的分布特征和对文档内容的代表能力不同,我们提出了一种改进的向量模型(PFTF),并通过trec12的查询实验,比较了传统向量模型与PFTF模型对单个标签域以及多个文档表示结果的结合的检索性能.实验结果表明,PFTF模型对于这两个方面都有提高. According to the text term distribution and content representing ability of different fields of HTML document we propose an improved Vector Space Model (PFTF Model) and compare the retrieval performance of the classical Vector Space Model with that of the PFTF Model by the experiments of a single field and multiple document representations combination using the queries of tree12. The results of these experiments show that the performance of PFIT Model is better than that of traditional model.
出处 《情报学报》 CSSCI 北大核心 2005年第4期433-437,共5页 Journal of the China Society for Scientific and Technical Information
基金 国家自然科学基金
关键词 HTML文档结构 信息检索 向量空间模型链接文本TREC 向量空间模型 HTML文档 文档结构 向量模型 分布特征 检索性能 标签 html document structure, information retrieval, vector space model, anchor text, TREC.
  • 相关文献

参考文献14

  • 1刘芳,卢正鼎.有效地检索HTML文档[J].小型微型计算机系统,2000,21(9):986-988. 被引量:23
  • 2Nick Craswell and David Hawking.Overview of the TREC-2002 Web Track.The 10th Text Retrieval Conference,Gaithersburg,2002
  • 3Nick Craswell and David Hawking.Overview of the TREC-2003 Web Track.The 10th Text Retrieval Conference,Gaithersburg,2003
  • 4Min Zhang,etc.THU TREC 2002: Web track experiments.In: Proceedings of Text Retrieval Conference,2002.586
  • 5Shuang Liu,Clement Yu,Wensheng Wu.UIC at TREC 2002: Web Track.In: Proceedings of Text Retrieval Conference,2002.658
  • 6Vo Ngoc Anh,Alistair Moffat.Homepage finding and topic distillation using a common retrieval strategy.In: Proceedings of Text Retrieval Conference,2002.733
  • 7Einat Amitay,David Carmel,Adam Darlow.Topic distillation with knowledge agents.In: Proceedings of Text Retrieval Conference,2002.263
  • 8Abdur Chowdhury,Mohammed Aljlayl,Eric Jensen,Steve Beitzel,David Grossman,Ophir Frieder.IIT at TREC-2002 linear combinations based on document structure and varied stemming for Arabic retrieval.In: Proceedings of Text Retrieval Conference,2002.299
  • 9Ricardo,Berthier.Modern Information Retrieval.China Machine Press,27~30
  • 10Joon ho Lee.Analyses of multiple evidence combination.In: SIGIR Proceeding,1997

二级参考文献1

  • 1上海交大远程教育中心,HTML 语言参考 .WWW书籍,1998年

共引文献22

同被引文献76

引证文献10

二级引证文献31

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部