期刊文献+

基于代表样本动态生成的中文网页分类 被引量:2

Chinese Web page classification based on representative samples dynamical generation
下载PDF
导出
摘要 针对中文网页分类问题该文设计了一种新的基于代表样本动态生成的分类算法。算法通过对原始训练样本集的训练逐个生成代表样本,并充分利用被裁剪训练样本的有效信息,对已生成的代表样本进行多次调整,从而使代表样本更具有代表性。基于该算法的中文网页分类器的实验结果表明,算法有效地压缩了原始训练样本集,提高了分类效率,同时保持了分类的准确性;具有较好的分类性能。 A new algorithm based on representative samples dynamical generation for Chinese Web page classification was proposed In this paper. The method generated representative samples through training the original samples; and then made the best of helpful information from every sample which was cut out to adjust the representative samples repeatedly in order to enhance the representativeness. Through the experiment with the Chinese Web classifier based on this algorithm, it shows that this algorithm can compress the original training corpus effectively so that classification efficiency can be improved substantially; meanwhile, this algorithm maintains the accuracy and has a better classification performance.
作者 华北 曹先彬
出处 《计算机应用》 CSCD 北大核心 2006年第10期2502-2504,共3页 journal of Computer Applications
基金 国家自然科学基金资助项目(60204009) 国家973规划项目(2004CB318109) 中科院复杂系统与智能科学重点实验室开放基金(20040104)
关键词 K-近邻 代表样本 调整 k-Nearest Neighbor representative samples adjustment
  • 相关文献

参考文献8

  • 1YANG Y,LIN X.A re-examination of text categorization methods[A].The 22th Annual Int ACM SIGIR Conference on Research and Development in Information Retrieval(SIGIR'99)[C].New York:ACM Press,1999.42-49.
  • 2LEWIS DD.Naive (Bayes) at forty:The independence assumption in information retrieval[A].The 10th European Conf on Machine Learning(ECML98)[C].New York:Springer-Verlag,1998.4-15.
  • 3SEBASTIANI F.Machine learning in automated text categorization[J].ACM Computing Surveys,2002,34(1):1-47.
  • 4JOACHIMS T.Text categorization with support vector machines:Learning with many relevant features[A].The 10th European Conf on Machine Learning(ECML-98)[C].Berlin:Springer,1998.137-142.
  • 5李晓黎,刘继敏,史忠植.基于支持向量机与无监督聚类相结合的中文网页分类器[J].计算机学报,2001,24(1):62-68. 被引量:108
  • 6贺海军,王建芬,周青,曹元大.基于决策支持向量机的中文网页分类器[J].计算机工程,2003,29(2):47-48. 被引量:19
  • 7李荣陆,胡运发.基于密度的kNN文本分类器训练样本裁剪方法[J].计算机研究与发展,2004,41(4):539-545. 被引量:98
  • 8ZHOU SG,LING TW,GUAN JH,et al.Fast text classification:a training-corpus pruning based approach[A].Proceedings of the 8th International Gonference on Database Systems for Advanced Application(DASFAA 2003)[G].IEEE GS,March 26 -28,Kyoto,Japan,2003.127-136.

二级参考文献16

  • 1[1]D D Lewis. Naive (Bayes) at forty: The independence assumption in information retrieval. In: The 10th European Conf on Machine Learning(ECML98), New York: Springer-Verlag, 1998. 4~15
  • 2[2]Y Yang, X Lin. A re-examination of text categorization methods. In: The 22nd Annual Int'l ACM SIGIR Conf on Research and Development in Information Retrieval, New York: ACM Press, 1999
  • 3[3]Y Yang, C G Chute. An example-based mapping method for text categorization and retrieval. ACM Trans on Information Systems, 1994, 12(3): 252~277
  • 4[4]E Wiener. A neural network approach to topic spotting. The 4th Annual Symp on Document Analysis and Information Retrieval (SDAIR 95), Las Vegas, NV, 1995
  • 5[5]R E Schapire, Y Singer. Improved boosting algorithms using confidence-rated predications. In: Proc of the 11th Annual Conf on Computational Learning Theory. Madison: ACM Press, 1998. 80~91
  • 6[6]T Joachims. Text categorization with support vector machines: Learning with many relevant features. In: The 10th European Conf on Machine Learning (ECML-98). Berlin: Springer, 1998. 137~142
  • 7[7]S O Belkasim, M Shridhar, M Ahmadi. Pattern classification using an efficient KNNR. Pattern Recognition Letter, 1992, 25(10): 1269~1273
  • 8[8]V E Ruiz. An algorithm for finding nearest neighbors in (approximately) constant average time. Pattern Recognition Letter, 1986, 4(3): 145~147
  • 9[9]P E Hart. The condensed nearest neighbor rule. IEEE Trans on Information Theory, 1968, IT-14(3): 515~516
  • 10[10]D L Wilson. Asymptotic properties of nearest neighbor rules using edited data. IEEE Trans on Systems, Man and Cybernetics, 1972, 2(3): 408~421

共引文献218

同被引文献19

  • 1乔玉龙,潘正祥,孙圣和.一种改进的快速k-近邻分类算法[J].电子学报,2005,33(6):1146-1149. 被引量:25
  • 2张高胤,谭成翔,汪海航.基于K-近邻算法的网页自动分类系统的研究及实现[J].计算机技术与发展,2007,17(1):21-23. 被引量:2
  • 3Craig Silverstein, Monika Henzinger. Analysis of a Very Large Web Search Engine Query Log. SIGIR Forum, 1999.
  • 4Yang Y, Li X. A re -examination of text categorization method. In: Proceedings of ACM SIGIR Conference on Research and Development in Information Retrieval,1999.
  • 5S Tan. An effective refinement strategy for KNN text classifier. Expert Systems with Applications, 2006 Elsevier.
  • 6L Baoli, L Qin,Y Shiwen. An Adaptive k -Nearest Neighbor Text Categorization Strategy ACM Transactions on Asian Language Information Processing ( TALIP), 2004,3 (4).
  • 7Yang Y, Liu X. A re-examination of text categorization methods[C]//Proceedings of ACM SIGIR Conference on Research and Development in Information Retrieval(SIGIR'99). Berkley, 1999..42-49.
  • 8P Langley, W Iba, K Thompson. An analysis of Bayesian classifiers[C]// National Conference on Artificial Intelligence, 1992: 223-228.
  • 9Furnkranz J. Exploiting structural information for text classification on the WWW[A]//IDA'99. Amsterdam: Springer Verlag, 1999: 487-497.
  • 10Slattery S. Hypertext Classification[D]. Pittsburgh:Carnegie Mellon University, 2001.

引证文献2

二级引证文献3

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部