期刊文献+

模糊聚类在特征选取中的应用 被引量:1

Application of Fuzzy Clustering Algorithm on Feature Selection
下载PDF
导出
摘要 提出了一种基于模糊聚类算法的高维特征选取方法。首先,利用Bhattacharyya距离过滤样本类别无关的特征;然后,基于递归特征剔除过程,提出了基于模糊迭代自组织数据分析技术(Interactive self-organizing dataanalysis technique,ISODATA)聚类方法,以样本与聚类中心的加权距离作为可分性指标,产生候选特征子集;最后,以候选特征子集分类和聚类的接受者操作特征曲线下面积(Area under the receiver operating characteristiccurve,AUC)值和正确率作为目标函数,确定最佳特征子集。将该方法用于选取5个基因表达谱数据集的特征基因,结果显示该方法所选特征具有较好的分类和聚类能力,说明了提出的特征选取方法的有效性。 A new feature selection method based on clustering algorithm is proposed to selecte informa- tive features. First, category-unrelated features are kicked out according to Bhattacharyya distance. Then, based on the process of recursive feature elimination, a weighted distance between sample and the cluster center generated by the fuzzy interactive self-organizing data algorithm (ISODATA) is used as the index of feature for separating different classes. Finally, the candidate feature subset with the maxi- mum area under the receiver operating characteristic curve (AUC) value and accuracy rate both in classi- fication and clustering tests is selected as the optimal feature subset. The proposed feature subset selec- tion method is applied to five gene expression profile datasets and experiment results show that the se- lected features have good performance in terms of both classification and clustering measurements. Re- sults demonstrate that the proposed method is effective for selecting informative features from high-di- mensional dataset.
出处 《南京航空航天大学学报》 EI CAS CSCD 北大核心 2012年第6期881-887,共7页 Journal of Nanjing University of Aeronautics & Astronautics
基金 国家自然科学基金(10172043 61173068)资助项目 教育部博士点基金(20093218110024)资助项目 江苏省国际合作(BZ2010060)资助项目 江苏省技术监督局重点(KJ122714)资助项目 安徽省教育厅自然科研重点(KJ2010A226)资助项目
关键词 特征选取 模糊迭代自组织数据分析技术 层次聚类 支持向量机 K近邻 feature selection fuzzy iteractive self-organizing data analysis technique (ISODATA) hierachical clustering support vector machine (SVM) K-nearest neighbor
  • 相关文献

参考文献23

  • 1Kudo M, Sklansky J. Comparison of algorithms that select features for pattern classifiers [J]. Pattern Recognition, 2000, 33 (1): 25-41.
  • 2Liu H, Motoda H, Yu Lei. A selective sampling ap- proach to active feature selection [J]. Artificial In- telligence, 2004, 159(1/2):49-74.
  • 3Hua J P, Tembe W D, Dougherty E R. Performance of feature-selection methods in the classification of high-dimension data[J]. Pattern Recognition, 2009, 42(3) :409-424.
  • 4Duda R O, Hart P E, Stork D G. Pattern classifica- tion (Second ed. ) [M]. New York: John WileySons, 2001.
  • 5Uncu O, Tiirksenb I B. A novel feature selection ap- proach: Combining feature wrappers and filters [J].Information Sciences, 2007,177 (2) :449-466.
  • 6John G, Kohavi R, Pfleger K. Irrelevant features and the subset selection problem[C]///Machine Learning: Proceedings of the Eleventh Internation- al. San Francisco, CA: Morgan Kaufmenn Publish- er, 1994:121-129.
  • 7刘全金,李颖新,阮晓钢.基于BP网络灵敏度分析的肿瘤亚型分类特征基因选取[J].中国生物医学工程学报,2008,27(5):710-715. 被引量:4
  • 8Kohavi R, John G H. Wrappers for feature subset selection[J].Artificial Intelligence, 1997, 97 (1/ 2) :273-324.
  • 9Das S. Filters, wrappers and a boosting-based hy- brid for feature selection [C]// Machine Learning: Proceedings of the Eighteenth International Confer- ence. Willianstown, MA:[s. n. ]. 2001:74-81.
  • 10Hong Y, Kwong S, Chang Yuchou, et al. Unsuper- vised feature selection using clustering ensembles and population based incremental learning algorithm [J]. Pattern Recognition, 2008, 41(9):2742-2756.

二级参考文献13

  • 1李衍达.以信息系统的观点了解基因组[J].电子学报,2001,29(z1):1731-1734. 被引量:7
  • 2Ramaswamy S, Golub TR. DNA Microarrays in Clinical Oncology [J]. Journal of Clinical Oncology, 2002,20(7) : 1932 - 1941.
  • 3Lander ES. Array of hope [J]. Nature Genetics, 1999, 21 (Suppl 1) : 3-4.
  • 4Lander ES, Weinberg RA. GENOMICS: Journey to the Center of Biology [J]. Scince, 2000,2,87(5459): 1777- 1782.
  • 5Duda RO, Hart PE, Stork GD. Pettern Classification [M]. (2nd Edition). New York, NY, John Wiley &Sons,2000.
  • 6Farid AE. Artificial neural networks for diagnosis and survival prediction in colon canccr [J]. Molecular Cancer, 2005, 4 : 29 - 41.
  • 7Ringner M, Peters, on C. Microarray-based cancer diagnosis with artificial networks [ J ]. BioTechniques, 2003, 39 : 530 - 535.
  • 8Liu Bing, Cui Qinghua, Jiang Tianzi, et al. A combinational feature selection and ensemble neural network method for classification of gent expression data [J]. BMC Bioinformatics, 2004, 5:136. ttp://www.blomedcentrat.com/1471-2105/5/136.
  • 9Statnikov A, Aliferls CF, Tsamardinos I, et al. A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis [J]. Bioinformatics, 2005, 21(5) : 631 - 643.
  • 10Khan J, Wei JS, Ringner M. Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural net works [J]. Nature Medicine, 2001, 7(6):637- 679.

共引文献3

同被引文献2

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部