Feature selection for co-training 被引量：2

Feature selection for co-training

下载PDF

导出

摘要 Co-training is a semi-supervised learning method, which employs two complementary learners to label the unlabeled data for each other and to predict the test sample together. Previous studies show that redundant information can help improve the ratio of prediction accuracy between semi-supervised learning methods and supervised learning methods. However, redundant information often practically hurts the performance of learning machines. This paper investigates what redundant features have effect on the semi-supervised learning methods, e.g. co-training, and how to remove the redundant features as well as the irrelevant features. Here, FESCOT （feature selection for co-training） is proposed to improve the generalization performance of co-training with feature selection. Experimental results on artificial and real world data sets show that FESCOT helps to remove irrelevant and redundant features that hurt the performance of the co-training method. Co-training is a semi-supervised learning method, which employs two complementary learners to label the unlabeled data for each other and to predict the test sample together. Previous studies show that redundant information can help improve the ratio of prediction accuracy between semi-supervised learning methods and supervised learning methods. However, redundant information often practically hurts the performance of learning machines. This paper investigates what redundant features have effect on the semi-supervised learning methods, e.g. co-training, and how to remove the redundant features as well as the irrelevant features. Here, FESCOT （feature selection for co-training） is proposed to improve the generalization performance of co-training with feature selection. Experimental results on artificial and real world data sets show that FESCOT helps to remove irrelevant and redundant features that hurt the performance of the co-training method.

作者李国正刘天羽

机构地区 School of Computer Engineering and Science School of Electronic

出处《Journal of Shanghai University(English Edition)》 CAS 2008年第1期47-51,共5页 上海大学学报（英文版）

基金 Project supported by the National Natural Science Foundation of China （Grant No.20503015）.

关键词 feature selection semi-supervised learning CO-TRAINING feature selection, semi-supervised learning, co-training

分类号 TP30 [自动化与计算机技术—计算机系统结构]

引文网络
相关文献

参考文献11

1Kamal Nigam,Andrew Kachites Mccallum,Sebastian Thrun,Tom Mitchell.Text Classification from Labeled and Unlabeled Documents using EM[J].Machine Learning (-).2000(2-3)
2LI G Z,YANG J,LIU G P,XUE L.Feature selec- tion for multi-class problems using support vector ma- chines[].Proceedings of th Pacific Rim Interna- tional Conference on Artificial Intelligence.2004
3SEEGER M.Learning with labeled and un- labeled data. http://www.dai.ed.ac.uk/seeger/papers.html . 2006
4ZHU X.Semi-Supervised Learning with Graphs[]..2005
5CHAWLA N V,KARAKOULAS G.Learning from labeled and unlabeled data:an empirical study across tech- niques and domains[].Journal of Artificial Intelli- gence Research.2005
6BLUM A,MITCHELL T.Combining labeled and unla- beled data with co-training[].Proceedings of the th Annual Conference on Computational Learning Theory.1998
7GOLDMAN S,ZHOU Y.Enhancing supervised learning with unlabeled data[].Proceedings of the th Inter- national Conference on Machine Learning.2000
8ZHOU Z H,LI M.Semi-supervised regression with co-training[].Proceedings of the th International Joint Conference on Artificial Intelligence(IJCAI‘).2005
9LIU H,Yu L.Toward integrating feature selection algo- rithms for classification and clustering[].IEEE Trans- actions on Knowledge and Data Engineering.2005
10JOACHIMS T.Transductive inference for text classifi- cation using support vector machines[].Proceedings of th International Conference on Machine Learning.1999

同被引文献3

1刘世岳,李珩,张俐,姚天顺.Co-training机器学习方法在中文组块识别中的应用[J].中文信息学报,2005,19(3):73-79. 被引量：8
2张博锋,白冰,苏金树.基于自训练EM算法的半监督文本分类[J].国防科技大学学报,2007,29(6):65-69. 被引量：17
3邓超,郭茂祖.基于Tri-Training和数据剪辑的半监督聚类算法[J].软件学报,2008,19(3):663-673. 被引量：30

引证文献2

1卢加磊,朱世华,丁香乾,黄跃华.基于Co-training的烟草原料数据优化分析[J].计算机与现代化,2010(2):176-179.
2徐飞裕,徐荣聪.基于密度敏感距离的协同训练算法[J].计算机应用与软件,2011,28(9):229-231.

1刘四平.计算机技术在大学教学中的应用[J].计算机光盘软件与应用,2013,16(12):229-230. 被引量：2
2陶建林,李楠.一种改进的基于方向图的指纹细化算法[J].商情,2013(6):272-272.
3SAVEN,张利东.Shader Model 3．0深度分析[J].微型计算机,2005(23):120-125.
4Next Day[J].读者（原创版）,2016,0(6):6-6.
5刘腾红,黄静.VRML:建立一个真实的世界[J].计算机时代,1998(5):34-35. 被引量：1
6冰河洗剑.谁是“爱偷窥的汤姆”？——Web2.0时代的网络隐私危机[J].大众软件,2009(6):24-29.
7ZHANG Huanguo,LUO Jie,JIN Gang,ZHU Zhiqiang,YU Fajiang,YAN Fei.Development of Trusted Computing Research[J].Wuhan University Journal of Natural Sciences,2006,11(6):1407-1413. 被引量：4
8云杉.ABAQUS继续致力于模拟真实的世界[J].航空制造技术,2005,0(12):16-16.
9航瑞.神秘的DEP就在身边[J].计算机应用文摘,2005(1):67-67.
10邱京伟.粒关联规则的属性挖掘算法及有关标记方法[J].宁德师范学院学报（自然科学版）,2013,25(4):373-375.

Journal of Shanghai University(English Edition)

2008年第1期

浏览历史

内容加载中请稍等...

Feature selection for co-training 被引量：2

参考文献11

同被引文献3

引证文献2

相关作者

相关机构

相关主题

浏览历史