摘要
基于基因表达谱提出了一种选取特征基因并使用多类支持向量机(MSVM)进行肿瘤亚型识别的方法。就小圆蓝细胞瘤(SRBCT)的亚型识别问题,以组间和组内平方和比率(BSS/WSS)作为衡量基因分类重要性的标准,据此选择基因构造若干MSVM模型,由分类错误率确定了含25个基因的特征集合,并利用基于相关距离的冗余分析方法去除冗余,得到15个特征基因。基于该特征子集构造的MSVM在测试集上取得100%的预测准确率。与相关文献的比较表明了该方法的有效性和可行性。
An approach to tumor molecular classification based on their gene expression profiles is presented.A new measure known as between-groups to within-groups sums of squares ratio(BSS/WSS) is used as the criterion of screening predictive genes for SRBCT subtype recognition.The 152 genes are chosen by this criterion and form the feature set whose subsets will be used to create MSVM models to identify the subtypes.The trained MSVM based on the top 25 genes ranked by BSS/WSS is able to achieve 100% accuracy on the training and blind test dataset.Then this subset is analyzed by the dissimilarity distance to remove its redundancy.As a result,the 15 genes are retained with the same accuracy as the subset of 25 genes and are regarded as the final subset.Comparison with other methods demonstrates efficiency and feasibility of the method and the predictive models proposed in this work.
出处
《计算机工程与应用》
CSCD
北大核心
2007年第3期223-226,共4页
Computer Engineering and Applications
关键词
多类支持向量机
基因表达谱
特征选取
Multi-category Support Vector Machine(MSVM)
gene expression profiles
feature selection