期刊文献+

一种改进的文本网页分类特征选择方法 被引量:8

A Novel Feature Selection Method for Web Pages Categorization
下载PDF
导出
摘要 网页分类是网络信息检索研究的关键技术之一。文中针对分类技术中的特征选择方法展开研究。在分析、比较常用的文本分类特征选择方法基础上 ,提出了一种联合特征选择方法。该方法将已有的X2 统计方法和互信息方法综合起来 ,在标准文本网页数据集分类实验中 ,综合查全率和查准率得到明显的提高。该选择方法已应用于“网络指南针” Web Pages Categorization is one of the key technologies for Web Pages Information Retrieval. This Paper proposes a novel feature selection method named Combined X 2 method, which combines X 2 method with Mutual Information method. Our Experiments based on real world data collected from Web, show that Combined X 2 method outperforms Mutual Information method, X 2 method, and other existing feature selection method based on X 2 Statistics. Finally, the research results in this paper has been applied in Network Compass system, a large scale hypertextual web search engine.
出处 《计算机应用》 CSCD 北大核心 2004年第7期119-121,共3页 journal of Computer Applications
基金 国家自然科学基金资助项目 (90 1 0 4 0 0 2 )
关键词 文本网页分类 特征选择 X^2统计量 互信息量 联合特征选择 Web pages categorization feature selection X 2 statistics mutual information combined X 2 feature selection
  • 相关文献

参考文献15

  • 1Mitchell T. Machine Learning[ M]. McCraw Hill, 1996.
  • 2Fano RM. Transmission of Information: A Statistical Theory of Communication[ M]. MIT Press, 1961.
  • 3Ng TH, Goh WB, Low KL. Feature Selection, Perceptron Learning, and a Usability Case Study for Text Categorization[ A]. 20th Ann Int ACM SIGIR Conference on Research and Development in Information Retrieval( SI-GIR '97) [ C], 1997.67 ~73.
  • 4Galavotti L, Sebastiani F, Simi M. Feature Selection and NegativeEvidence in Automated Text Categorization[ A]. Proc. of the ACM KDD-00Workshop on Text Mining[ C]. Boston, US, 2000.
  • 5Lewis DD. Feature selection and feature extraction for text categorization[ A]. Proc. of Speech and Natural Language Workshop, February 1992. 212 -217.
  • 6Yiming Yang, Pedersen JO. A comparative Study on Feature Selection in Text Categorization[ A]. Proc. of, the 14th International Conference on Machine Learning, ICML '97[ C], 1997. 412 -420.
  • 7Yu H, Han J, Chang KC. PEBL: Positive-example based learning for Web page classification using SVM[ A]. Proc. 8th Int. Conf. Knowledge Discovery and Data Mining[ C]. Edmonton, Canada, 2002.
  • 8Yiming Yang. An evaluation of statistical approaches to text categorization[J]. Journal of Information Retrieval, 1999, 1(1/2): 67-88.
  • 9Ji He, Ah-hwee Tan, Chew-lim Tan. On Machine Learning Method for Chinese Text Categorization[ J]. Applied Science, 2003,18:311 - 322.
  • 10Salton G, Buckley C. Term Weighting Approaches in Automatic Text Retrieval[ J]. Information Processing and Management, 1988,24(5): 513 - 523.

二级参考文献9

  • 1黄萱青 吴立德.独立于语种的文本分类方法[M].,2000.37-43.
  • 2鲁松 白硕 等.文本中词语权重计算方法的改进[M].,2000.31-36.
  • 3卜东波.聚类/分类理论研究及其在大模型文本挖掘的应用:博士论文[M].,2000..
  • 4Zhang Li,The First AEARU Workshop on Web Technology,1998年,10页
  • 5黄萱菁,2000 International Conference on Multilingual Information Processing,2000年,37页
  • 6鲁松,2000 International Conference on Multilingual Information Processing,2000年,31页
  • 7卜东波,博士学位论文,2000年
  • 8Yang Yiming,Proceedings of ACMSIGIR Conference on Research and Development in Information Retrieval(SIGIR),1999年,42页
  • 9Yang Yiming,J Information Retrieval,1999年,1卷,1/2期,67页

共引文献309

同被引文献48

引证文献8

二级引证文献90

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部