期刊文献+

一种基于均值更新的分类模型 被引量:1

Classification Model Based on the Mean Update
下载PDF
导出
摘要 最小距离分类法和最近邻分类法是最简单、快速、有效的分类方法,但对噪声较敏感,对于训练样本很少或训练样本偏离类中心较远时,分类效果较差。针对这一问题,提出了基于均值更新(MU)的分类模型,通过不断扩大训练样本并更新均值中心来改善对测试数据的分类效果;并在此基础上提出了基于均值更新的最小距离(MU-MD)分类模型,利用MU的分类结果重新计算各类的均值,然后采用最小距离法对所有测试样本重新进行划分,以确定最终的类别归属,这样可以部分纠正MU分类过程中的错分,进一步提高分类效果。 The minimum distance classification algorithm and the nearest neighbor classification algorithm are the simplest, most rapid and most effective classification methods, and they are more sensitive to the noise. But to the training samples in few or the training samples that are far fi'om the cluster center, the classification results is poor. To solve this problem, this paper proposes a classification model based on the mean update (MU), by expanding the training sample and updating the mean center to improve the classification results of the test data; and on this basis, it proposes the MU-based minimum distance (MU-MD) classification model, and uses the MU's classification results to recalculate the mean of all test samples, then all test samples are re-divided by using the minimum distance method, so as to determine the final category attribution. This can partially correct misclassification in the MU category process and further improve the classification results.
出处 《计算机系统应用》 2012年第8期123-126,135,共5页 Computer Systems & Applications
关键词 最小距离分类法 均值更新 训练样本 测试样本 the minimum distance classification algorithm mean update training samples test samples
  • 相关文献

参考文献3

二级参考文献20

  • 1王煜,白石,王正欧.用于Web文本分类的快速KNN算法[J].情报学报,2007,26(1):60-64. 被引量:33
  • 2Lewis D D. Naive Bayes at Forty: The Independence Assumption in Information Retrieval // Proc of the lOth European Conference on Machine Learning. Chemnitz, Germany, 1998 : 4 - 15.
  • 3Cohen W W, Singer Y. Context-Sensitive Learning Methods for Text Categorization// Proc of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.Zurich, Switzerland, 1996 : 307 - 315.
  • 4Joaehims T. Text Categorization with Support Vector Machines: Learning with Many Relevant Features//Proc of the 10th European Conference on Machine Learning. Chemnitz, Germany, 1998: 137 - 142.
  • 5Nigam K, Lafferty J, McCallum A. Using Maximum Entropy for Text Classification//Proc of the Workshop on Machine Learning for Information Filtering. Stockholm, Sweden, 1999 : 61 - 67.
  • 6Yang Yiming, Liu Xin. A Re-Examination of Text Categorization Methods// Proc of the 22nd Annual International ACM SIGIR Conference on Research and Development in the Information Retrieval. Berkeley, USA, 1999:42-49.
  • 7Sebastiani F. Machine Learning in Automated Text Categorization. ACM Computing Surveys, 2002, 34 ( 1 ) :1- 47.
  • 8Hull D A. Improving Text Retrieval for the Routing Problem Using Latent Semantic Indexing// Proc of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Dublin, Ireland, 1994 : 282 - 289.
  • 9Joachims T. A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization//Proc of the 14th International Conference on Machine Learning. Nashville, USA, 1997: 143-151.
  • 10Galavotti L, Sebastiani F, Simi M. Experiments on the Use of Feature Selection and Negative Evidence in Automated Text Categorization//Proc of the 4th European Conference on Research and Advanced Technology for Digital Libraries. Lisbon, Portugal, 2000 : 59 - 68.

共引文献64

同被引文献6

引证文献1

二级引证文献4

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部