期刊文献+

一个网页自动分类系统的设计 被引量:2

The Design of Web Page Automatic Categorization System
下载PDF
导出
摘要 本文介绍了设计的一个网页自动分类系统。介绍了预处理 ,批量训练 ,特征选择 ,在线测试和重归档等模块的设计过程。系统采用有指导的学习方法 ,选取 Naive Bayes作分类模型和信息增益作为特征提取方法。测试结果表明 。 A Web page automatic categorization system presents in this paper.It introudes the main module of the system including preprocessing,batch training,feature selection,online test and refiling.The system adopts supervised learning,Naive Bayes as the categorization model and information gain as the feature selection.The test results show that it can get good precision.
出处 《计算技术与自动化》 2002年第1期58-61,共4页 Computing Technology and Automation
关键词 简单贝叶斯 信息增益 监督学习 网页自动分类系统 设计 INTERNET 计算机网络 Web page categorization Naive Bayes Information gain Supervised learning
  • 相关文献

参考文献3

  • 1Jason Rennie. ifile: An Application of Machine Learnin to E-Mail Filtering [J]. KDD-2000 Workship on Text Mining, 2000 , Boston, MA, USA.
  • 2Andrew McCallum, Kammal Nigam, Janson Rennie, etc. Building Doman- Specific Search Engines with Machine Learning Techniques[J]. Proc AAAI-1 Spring System on Intelligne Agents in Cyberspace. 1999.
  • 3Y.Yang. Prdersen. A Comparative Study on Feature Selection in Text Categorization[J]. In Internatonal Conference on Machine Learning(ICML), 1997.

同被引文献13

  • 1Yi Lan,Liu Bing. Web Page Cleaning for Web Mining through Feature Weighting[A]. Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence ( UCAI - 03 )[C]. Acapulco, Mexico: [s. n. ] ,2003. 654 - 656.
  • 2Lin Shian-Hua,Ho Jan-Ming. Discovering Informative Content Blocks from Web Documents [ A ]. Proceedings of the eighth ACM SIGKDD International Conference on Knowledge Discovery & Data Mining[C]. New York,USA: [s. n. ] ,2002.588 - 593.
  • 3W3C.Dxzument Object Model (DOM) Levd 2 Core Specification[EB/OL]. http://www. w3. org, 2000.
  • 4Elsas J, Efron M. HTML Tag Based Metrics for use in Web Page Type Classification[ A]. submited to ASIST Annual Meeting[ C]. Providence, USA: [s. n. ] ,2004.
  • 5Nicola Guarino. Formal Ontology and Information Systems[C].Proceedings of FOIS'98,Amsterdam:IOS Press, 1998.3- 15.
  • 6Koch,Traugott et al.The role of classification schemes in Internet resource description and discovery[R].Work Package 3 of Telematice for Research project DESIRE(RE 1004) 1997.http://www.ukoln.ac.uk/metadata/desire/classification,2004-11-11.
  • 7陈光祚.论“图书情报学虚拟图书馆”的建设[J].中国图书馆学报,2000,26(1):19-23. 被引量:73
  • 8张琪玉.网络信息检索工具增强关键词检索功能的措施[J].图书馆杂志,2001,20(1):7-10. 被引量:45
  • 9张琪玉.情报语言漫笔(C)[J].图书馆理论与实践,2002(3):42-43. 被引量:17
  • 10董小英,张本波,陶锦,冯安命.中国学术界用户对互联网信息的利用及其评价[J].图书情报工作,2002,46(10):29-40. 被引量:63

引证文献2

二级引证文献13

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部