摘要
目的研究适用于全基因组关联分析研究的自动基因选择方法。方法提出了一个基于随机复杂度约束的岭回归优化方法模型——L20模型,并通过推导给出了该模型的快速求解方法。结果通过在1组糖尿病临床数据和4组癌症基因芯片公开数据集上的实验,验证了L20算法的有效性。L20算法不受样本和特征维度限制,对稀疏建模问题的输出结果稳定性好,预测准确性高。且该方法在特征选择过程中即可同步确定特征参数,算法效率高。结论实验数据表明该方法性能优于基因组关联分析研究常用的特征选择算法,为肿瘤遗传标记定位研究提供了一种有希望的解决方案。
Objective To study automatic gene selection algorithm adaptable for genome-wide association analysis. Methods The proposed L20 algorithm was defined as an optimization of the ridge regression problem which was based on the restraint of stochastic complexity of the gene models. A simple and effective derived solution to the optimized problem was also provided in this paper. Results Five binary diseases classification problems derived from a clinical diabetes data set and four publicly available microarray data sets were examined to verify the performance of the proposed algorithm. The proposed algorithm was not restricted by the size of the features and the samples, and its output was stable and accurate. Besides, it was computationally efficient since the model parameters could be decided through the feature selection process.Conclusion Numerical results also show that the L20 method outperforms many other conventional methods, which makes it a promising solution for tumor biomarker identification.
出处
《航天医学与医学工程》
CAS
CSCD
北大核心
2010年第4期274-278,共5页
Space Medicine & Medical Engineering
基金
国家高技术研究发展计划(863计划)资助项目(2006AA01Z411)
关键词
基因选择
特征选择
全基因组关联
生物标记定位
gene selection
feature selection
genome-wide association
biomarker identification