期刊文献+

基于Ontology和EM方法的网页分类研究 被引量:1

Web Page Classification Research Based on Ontology and EM
下载PDF
导出
摘要 1.引言 当前,Internet上广泛流行的各种搜索引擎,为人们寻找资源提供了便利,而且还辅以各种用于提高精确度的技术,但普遍缺乏导引能力,即不能帮助用户确定所需信息所在的领域,使得获得的结果经常是风马牛不相及.所以,目前迫切需要的就是开发一种智能化、个性化的搜索工具,使其能够满足不同用户对不同领域的信息进行发现和积累的要求. Works on abstracting semantic information from substantive pages of Web and their usage in search engine can lead to intelligent retrieval,or other individual services. This paper mainly focuses on some research about analysis of Web page classification infor. Ontology as a base,using TFIDF word weights and Rocchio algorithm is combined with EM to improve accuracy of classifier. It's proved that this EM procedure works well on enhancing the veracity by the usage of unlabeled pages when the samples are limited.
出处 《计算机科学》 CSCD 北大核心 2003年第11期112-115,共4页 Computer Science
关键词 网页分类 TFIDF EM 研究 方法 Ontology ,VSM,Classifier,Feature vector ,Document vector
  • 相关文献

参考文献12

  • 1Chandrasokaran B,Josephson J R. Whar Are Ontologies ,and Why Do we Need Them?IEEE Intelligent systems, January/February 1999
  • 2Berners-Lee T, Hendler J, Lassila O, 魏丰译. 语义网. http://www. xml. org. cn
  • 3Nigam K,Mccallum A, Thrun S, Mitchell T. Text Classification from Labeled and Unlabeled Documents uring EM. Machine Learning, 1999
  • 4Yang Yiming, Pedersen J O. A Comparative Study on Feature Selection in Text Categorization. In: Proc. of ICML-97, 14th Intl. Conf. on Machine Learning,1997
  • 5Joachims T. A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization. In: Proc. of ICML-97, 14th Intl. Conf. on Machine Learning, 1997
  • 6Luke S. Ontology-Based Knowledge Discovery on the World-WideWeb http://www. cs. umd. edu/~seanl/, 1996
  • 7Iwazume M, Shirakami K, Hatadani K, Takeda H, Nishida T. ⅡCA: An Ontology-based Internet Navigation System. JL41-96 Workshop on Internet-based Information Systems, Portland, OR, 1996
  • 8Rocchio J. Relevance Feedback in Information Retrieval, in The SMART Retrieval System: Experiments in Automatic Document Processing,Chapter 14,Prentice-Hall Inc. 1971. 313~323
  • 9Dempster A P, Laird N M, Rubin D B. Maximum likelihood from incomplete data via the EM algorithm. Journel of the Royal Ststistical Society,Series B,1977,39(1) :1~38
  • 10朱明,王军,王俊普.Web网页识别中的特征选择问题研究[J].计算机工程,2000,26(8):35-37. 被引量:29

二级参考文献5

  • 1吴立德,大规模中文文本处理,1997年
  • 2Salton G,Communications of ACM,1975年,18卷,613页
  • 3Hu Yuhen,IEEE Signal Processing Magazine,1997年,11卷,39页
  • 4边肇祺,模式识别,1988年
  • 5Yang Yiming,Proceedings of the 14th International Conference on Machine rning,1997年,412页

共引文献74

同被引文献11

  • 1MITCHELLTM.机器学习[M].北京:机械工业出版社,2003.204-208.
  • 2Chandrasokaran B, Josephson J R. What Are Ontolngies, and Why Do we Need Them? IEEE Intelligent Systems, 1999, 14:20-26.
  • 3Sure Y. Semantic Web Research Community Ontology. http://ontobroker. semanticweb. org/ontos/swrc.html,2001-12-11.
  • 4Connolly D. A Little History of the World Wide Web. http://www. w3. org/History. html, 2002. 11.
  • 5Barfourosh A A, Anderson M L. Information Retrieval on the World Wide Web and Active Logic: A Survey and Problem Definition, http://www.cs.umd. edu/Library/TRs/CS-TR-4291/CS-TR-4291.pdf,2002.
  • 6Baeza-Yates R, Ribeiro-Neto B. Moderm Information Retrieval.Addison Wesley, 1999.
  • 7Sundaresan N, Yi J. Mining the Web for Relations. Computer Networks: The International Journal of Computer and Telecommunications Networking. Amsterdam, Netherlands: North-Holland Puhlishing Co, 2000. 699-711.
  • 8Quinlan J R, Cameron-Jones R M. FOIL: A Midterm Report.In:Proc. of the European Conference on Machining Learning. Vienna, Austria: Springer Verlag, 1993. 3-20.
  • 9Quinlan J R. Learning Logic Definitions from Relations. Machining Learning, 1999, 5:239-266.
  • 10Craven M, DiPasquo D, Freitag D, et al. Learning to Extract Symbolic Knowledge from the World Wide Web. In: Proc. of the 15th National Conf. on Artificial Intelligence (AAAI-98). Madison, US:AAAI Press, 1998. 509-516.

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部