摘要
肝癌是中国最常见的恶性肿瘤之一。基于肿瘤基因表达谱数据的分析与研究是当今研究的热点,对于癌症的早期诊断、治疗具有十分重要的意义。针对高维小样本基因表达谱数据所显现的变量间严重共线性、类别变量与预测变量的非线性关系,采用了基于样条变换的偏最小二乘回归新技术。首先通过筛选法去除基因表达谱数据中的冗余信息,然后以3次B基样条变换实现非线性基因表达谱数据的线性化重构,随后将重构的矩阵交由偏最小二乘法构建类别变量与预测变量间的关系模型。最后,通过对肝癌肿瘤基因表达谱数据的分析,结果显示此分类模型对数据重构稳健,有效的解决了高维小样本基因表达谱数据间的过拟合和变量间的共线性,具有较高的拟合和分类正确率。
Hepatocellular Carcinoma (HCC)is one of the most popular malignant tumors in the world. Recently, the research base on gene expression profile is a hot topic and has strong impact on HCC treatment and diagnosis. Owing to the severe collinearity among variables and the nonlinear relationship between predictor variables and response variables, a novel technology of Partial Least Squares (PLS)base on Spline Transformation (SPLINE-PLS)was adopted. The redundancy in gene expression profile should be eliminated through filter method. Then B-spline function of original non-linear space was transformed into new linear space by using non-linear transformation and the related model between new response variables and predictor variables built with PLS. By analysis of HCC data set, the result showed that this method could yield high accuracy in reconstructing gene data set and overcome the drawback of overfitting and collinearity between variables.
出处
《生物学杂志》
CAS
CSCD
2011年第6期58-61,共4页
Journal of Biology
基金
北京市自然科学基金(Grant o.4092021)
北京市教育委员会科技计划项目(JC002011200903)
关键词
基因表达谱
样条变换
偏最小二乘
筛选法
过拟合
gene expression profile
spline transformation
PLS
filtering method
overfitting