期刊文献+

基于词共现模型与DOM的石油主题采集策略

A Petroleum Subject Collection Strategy Based on the Word co-occurrence Model and the DOM
下载PDF
导出
摘要 提出了一种基于DOM树的词共现模型,首先利用文档的结构信息生成DOM树,并依据DOM树的结构特点来统计文档中主题词的共现信息,最后采用向量空间模型实现对石油主题网页的采集和分类。它改进了原有的词共现模型,突出了利用位置信息来优化词共现模型的特点。实验证明该策略使采集和分类的性能都有了一定的提高。 This paper proposes a word co - occurrence model which bases on the DOM tree. At first , the model uses the structure information of the document to build a DOM tree, and then counts the co - occurrence information in the documents bases on the structure of the DOM tree ,finally implements collection and categorization for the petroleum subject. It improves the original word co - occur- rence model and highlighted the character of positional information to optimize the word co - occurrence model. The experiment proves this strategy caused a certain enhancement for collection and categorization.
作者 李村合 李晗
出处 《微计算机应用》 2008年第2期28-31,共4页 Microcomputer Applications
关键词 词共现模型 DOM树 文本分类 主题采集 向量空间模型 word co-occurrence model, DOM(document object model)tree, text categorization, focused web crawling, vector space model
  • 相关文献

参考文献4

二级参考文献25

  • 1刘挺,吴岩,王开铸.基于信息抽取和文本生成的自动文摘系统设计[J].情报学报,1997,16(S1):31-36. 被引量:13
  • 2[3]Han EH,Karypis G.Centroid-Based Document Classification:Analysis & Experimental Results[ EB/OL]. http://www -users. itlabs.umn. edu/~ karypis/publications/data - mining. html, 2000.
  • 3[4]Buzydlowski JW, Whie HD. Term Co-occurrence Analysis as an Interface for Digital Libraries[ A]. JCDL - The First ACM + IEEE Joint Conference on Digital Libraries[C]. 2001.
  • 4[5]Schuetze, Hinrich. Document information retrieval using global word co - occurrence patterns[ EB/OL]. http://www. delphion. com/details?pn10 = US05675819, 1997.
  • 5LUHN HP.The automatic creation of literature abstract[J].IBM Journal of Research and Development,1958,2(2):159-165.
  • 6RUSH JE,SALVADOR R,ZAMORA A.Automatic abstracting and indexing production of indicative abstracts by application of contextual inference and syntactic coherence criteria[J].Journal of American Society for Information Society,1971,22(4):260-274.
  • 7SALTON G,SINGHAL A,MITRA M.Automatic Text Structuring and Summarization[J].Information Processing and Management,1997,33(2):193-207.
  • 8RAU LF.Concpetual information extraction and retrieval from natural language input[A].Proceedings of RIAO 88 Conference[C],1988.424-437.
  • 9DELORT JY,BOUCHON-MEUNIER B,RIFQI M.Enhanced Web Document Summarization Using Hyperlinks[A].Proceedings of the fourteenth ACM conference on Hypertext and hypermedia[C].United Kingdom,2003.208-215.
  • 10HU M,LIU B.Mining and Summarizing Customer Reviews[A].KDD04[C],2004.22-25.

共引文献38

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部