摘要
通过对基因表达谱数据的分析从而促进肿瘤诊断与治疗技术的发展,其研究正成为生物医学领域的一个热点。因此,提出了一种熵信息处理和主成分分析(principal component analysis,PCA)相结合的方法。首先运用熵信息对超高维基因表达谱数据进行粗选取,得到特征基因子集;由于基因子集仍存在相关性,进而利用PCA对其进一步冗余剔除;最后对得到的无冗余且具有正交性信息的基因特征进行真实数据实验。实验结果显示所采用的方法能有效去除肿瘤样本中的不相关和冗余信息,同时最大程度的保留肿瘤分类信息。与其他肿瘤分类方法相比,在精度上具有比较明显的优势,从而验证了该方法是有效的、可行的。
Analysis of the gene expression data to promote the technology of cancer diagnostic and treatment is becoming a hot research in today's field of bioinformatics. Therefore,this paper presents a method combined entropy of information processing and PCA. The use of entropy information on ultra-high-dimensional gene expression data can coarse selection and get feature subset. Due to the subset of genes are still relevant,the use of PCA can further eliminate its redundancy. The non-redundant information and orthogonal genetic characteristics obtained were real data experiment. The results show that the method used in this paper can effectively rule out redundancy of samples while maximum preserve the overall gene information. Compared with other tumor classification method,it has obvious advantages in accuracy,which proves that the method is effective and feasible.
出处
《生物学杂志》
CAS
CSCD
2014年第6期15-18,共4页
Journal of Biology
基金
国家自然科学基金(60772121)
安徽省自然科学基金资助项目(1208085MF93)
安徽大学"211工程"学术创新团队基金资助(KJTD007A)
安徽大学2013年大学生科研训练计划(KYX12013032)