期刊文献+

基于DOM的信息检索研究 被引量:1

Research of Information Retrieval based on DOM
下载PDF
导出
摘要 向量空间模型是信息检索中的重要模型,传统的向量空间模型考虑了特征项在目标文档中的出现频率和文档频率,但并未考虑特征项出现在文本中的位置这一重要信息。针对这一问题,文章在将文档以文档对象模型表示的基础上,根据特征项出现的位置不同,对特征项的权重额外附加一个不同的系数,以反映不同位置上的特征项在表达文档主旨上的能力差异,以期改善返回文档的排序质量,改进用户的检索工作。通过模拟实验,验证了该方法相比于传统VSM在改进检索效果上的优势。 Vector Space Model is a important model in information retrieval, traditional Vector Space Model take feature term frequence and document frequence into account, regardless of the location feature term appears, which is a significant information. Considering the problem above, after turn document into Document Object Model, this paper add a ratio to weight of feature term based on different location it appears to inflect different ability of feature term that appears in different location in expressing main idea of the document, thus improve ranking result of documents returned and users' retrieving work. Simulation experiment manifests the advantage of the solution above over traditional VSM.
出处 《信息网络安全》 2014年第5期82-86,共5页 Netinfo Security
关键词 信息检索 位置信息 DOM LVSM information retrieval location information DOM LVSM
  • 相关文献

参考文献11

  • 1COOPER W S. Getting Beyond Boole[J]. Information Processing and Management, 1988, 24(03):225-243.
  • 2SALTON G, WONG A, YANG C S. A Vector Model for Automatic Indexing[J]. Communication of the ACM, 1975, 18(11):613-620.
  • 3MAR.ON M E, KUHNS J L. On relevance, probabilistic indexing and information retrieval[J]. Journal of the ACM, 1960, 7(03): 216-244.
  • 4SALTON G, WONG A, YANG C S. On the Specification of Tern1 Values in Automatic Indexing[J]. Journal of Documentation, 1973, 29(04):351 372.
  • 5罗欣,夏德麟,晏蒲柳.基于词频差异的特征选取及改进的TF-IDF公式[J].计算机应用,2005,25(9):2031-2033. 被引量:55
  • 6陈志敏,沈洁,林颖,周峰.基于主题划分的网页自动摘要[J].计算机应用,2006,26(3):641-644. 被引量:8
  • 7Google[EB/OL]. http://www.google.com,2013-05-09.
  • 8NortherLight[EB/OL]. http://www.northernlight.com,2013-05- 09.
  • 9Infoseek[EB/OL]. http://www.infoseek.com, 2013-05-09.
  • 10SELBERG E, ETAION O. Muti-service search and comparison using the MetaCraw- ler[C]. 4th int.WVq-W Conference, 1995:195-208.

二级参考文献19

  • 1刘挺,吴岩,王开铸.基于信息抽取和文本生成的自动文摘系统设计[J].情报学报,1997,16(S1):31-36. 被引量:13
  • 2LUHN HP.The automatic creation of literature abstract[J].IBM Journal of Research and Development,1958,2(2):159-165.
  • 3RUSH JE,SALVADOR R,ZAMORA A.Automatic abstracting and indexing production of indicative abstracts by application of contextual inference and syntactic coherence criteria[J].Journal of American Society for Information Society,1971,22(4):260-274.
  • 4SALTON G,SINGHAL A,MITRA M.Automatic Text Structuring and Summarization[J].Information Processing and Management,1997,33(2):193-207.
  • 5RAU LF.Concpetual information extraction and retrieval from natural language input[A].Proceedings of RIAO 88 Conference[C],1988.424-437.
  • 6DELORT JY,BOUCHON-MEUNIER B,RIFQI M.Enhanced Web Document Summarization Using Hyperlinks[A].Proceedings of the fourteenth ACM conference on Hypertext and hypermedia[C].United Kingdom,2003.208-215.
  • 7HU M,LIU B.Mining and Summarizing Customer Reviews[A].KDD04[C],2004.22-25.
  • 8GUPTA S,KAISER G,NSISTADT D,et al.DOM-based Content Extraction of HTML Documents[A].Proceedings International WWW Conference[C].New York:ACM Press,2003.207-214.
  • 9YI L,LIU B,LI X.Eliminating Noisy Information in Web Pages for Data Mining[A].SIGKDD'03[C],2003.24-27.
  • 10KIERAS DE.Thematic processes in the comprehesion of technical prose[A].BRITTON BK,BLACK JB,ed.Understanding Expository Text[C].Hillsdale,NJ:Lawrence Erlbaum,1985.89-107.

共引文献61

同被引文献4

引证文献1

二级引证文献7

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部