摘要
针对具有高维、小样本且比例不平衡的乳腺癌基因表达数据无法直接进行分类预测的问题,建立基于Lasso的递归特征消除与支持向量机的预后模型。首先对特征基因进行差异性分析,去除无显著差异特征基因。其次,对特征基因进行双采样,改善小类样本造成的算法敏感性较差的问题。同时,使用基于Lasso的递归特征消除的改进算法,减少Lasso可调参数改变造成的误差,实现对特征基因的稳定选择与逐步减少。最后,对完成特征提取后的37个特征基因使用支持向量机实现乳腺癌分类预后。与其它模型相比,本模型准确性特异性得到有效提高,可实现较为准确的预后预测。
Aiming at the problem that breast cancer gene expression data with high dimensions, small samples and unbalanced proportions cannot be directly classified and predicted, a prognostic model based on Lasso(Least Absolute Shrinkage and Selection Operator) recursive feature elimination and support vector machine is established. First, the difference of the characteristic genes was analyzed, and the characteristic genes without significant differences were removed. Secondly, the characteristic genes were double-sampled to improve the problem of poor algorithm sensitivity caused by small samples. At the same time, an improved algorithm based on Lasso’s recursive feature elimination was used to reduce the error caused by the change of Lasso’s adjustable parameters, and achieve stable selection and gradual reduction of feature genes. Finally, the support vector machine was used to realize the classification and prognosis of breast cancer for the 37 feature genes after the feature extraction. Compared with other models, the accuracy and specificity of this model are effectively improved, and it can achieve more accurate prognosis prediction.
作者
刘嘉欣
王宏伟
王佳
LIU Jia-xin;WANG Hong-wei;WANG Jia(Xinjiang University,School of Electrical Engineering,Urumqi Xinjiang 830000,China;Dalian Medical University,School of Basic Medicine,Dalian Liaoning 110041,China;Amy Hanxin Vaccine(Dalian)Co,Ltd.,Dalian Liaoning 116100,China)
出处
《计算机仿真》
北大核心
2022年第12期330-335,共6页
Computer Simulation
基金
国家自然科学基金(61863034)。
关键词
套索算法
乳腺癌
基因表达数据
预后预测
特征消除
Breast cancer
Lasso
Gene expression data
Prognosis prediction
Feature elimination