期刊文献+

基于语景图的主题爬取器的初步设计

Preliminary Design of A Context-Graph-based Focused Crawler
下载PDF
导出
摘要 介绍了一个基于语景图的Web主题爬取器的初步设计。描述了NB分类器的文本学习的向量空间模型——Bernoulli模型及NaiveBayes分类器设计提出了简化的前端队列优先排序的设计方案,即下载文档的归一化文档向量与查询向量的余弦相似度,作为层内下载文档的排序准则,以便与各层队列中文档的类似然率得分排序进行对比。介绍了自动实现爬取结果与主题分类目录的集成设想。 This paper designes a focused crawler using context graph. The crawler is based on a set of Naive Bayes classifiers, which adopt both VSM and probability model for design comparison purpose. The frontier priority queue within a layer of the context graph is sorted by the cosine similarity between a downloaded normalized document vector and the query vector. An approach to classifying search results into a pre-defined category is presented.
作者 李道生 赵强
出处 《计算机工程》 EI CAS CSCD 北大核心 2006年第12期208-209,228,共3页 Computer Engineering
关键词 主题爬取 机器学习 语景图 Focused crawling Machine learning Context graph
  • 相关文献

参考文献9

  • 1Cho J.Efficient Crawling Through URL Ordering[C].Proceedings of the 7^th International WWW Conference,Brisbane,Australia,1998-04.
  • 2庞剑锋,卜东波,白硕.基于向量空间模型的文本自动分类系统的研究与实现[J].计算机应用研究,2001,18(9):23-26. 被引量:293
  • 3Din M.Focused Crawling Using Context Graphs[C].Proceedings of the 26^th International Conference on Very Large Databases,Cairo,Egypt,2000-09.
  • 4Salton G,Michael J.Introduction to Modern Information Retrieval[M].McGraw-Hill,1983.
  • 5Chakrabarti S.Data Mining for Hypertext a Tutorial Survey[J].ACM SIGKDD Explorations Newsletter,2000,1(1):1-11.
  • 6Chakrabarti S.Using Discriminates and Signatures for Navigating in Text Databases[C].Proceedings of the 23rd VLDB Conference,1997:446-455.
  • 7Ristard E S.A Natural Law of Succession[R].Princeton University,TR CS-TR-495-95,1995-07.
  • 8Baldi P.Internet and Web Modeling― Probability Methods and Algorithms[M].Wiley Publishing House,1999.
  • 9Koller D,Sahami M.Hierarchically Classifying Documents Using Very Few Words[C].Proceedings of International Conference on Machine Learning,1997.

二级参考文献8

  • 1黄萱青 吴立德.独立于语种的文本分类方法[M].,2000.37-43.
  • 2鲁松 白硕 等.文本中词语权重计算方法的改进[M].,2000.31-36.
  • 3卜东波.聚类/分类理论研究及其在大模型文本挖掘的应用:博士论文[M].,2000..
  • 4黄萱菁,2000 International Conference on Multilingual Information Processing,2000年,37页
  • 5鲁松,2000 International Conference on Multilingual Information Processing,2000年,31页
  • 6卜东波,博士学位论文,2000年
  • 7Yang Yiming,Proceedings of ACMSIGIR Conference on Research and Development in Information Retrieval(SIGIR),1999年,42页
  • 8Yang Yiming,J Information Retrieval,1999年,1卷,1/2期,67页

共引文献292

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部