摘要
为了研究特征间的内在关系,提出了一种基于稀疏自编码器的无监督特征工程算法BioSAE对给定数据集进行编码,并猜想经过稀疏自编码器编码的新构造特征可以训练出更好的分类模型。使用来自TCGA的6种癌症类型的3494个甲基化样本进行了综合评估与实验,首先通过稀疏自编码器得到经过编码的特征,然后使用这些特征与原始的甲基化特征进行分析和对比。实验结果表明:在本研究进行的大多数建模实验中,经过BioSAE编码的特征均优于原始的甲基化特征。同时,将这一算法应用于一些其他领域数据集,如图像数据等,同样取得了相似的提升效果。
To study the internal relationship between features,a feature engineering algorithm based on sparse autoencoder(BioSAE)was proposed to encode given datasets,and it was assumed that the features encoded by sparse autoencoder might become better disease biomarkers.A comprehensive evaluation and experiment were carried out using 3494 methylation samples from 6 cancer types from TCGA.First,the encoded features were obtained through the sparse autoencoder,and then these features were analyzed and compared with the original methylation features.The experimental results show that in most modeling experiments conducted in this study,the BioSAE-encoded features are better than the original methylation features.Applying this algorithm to the datasets in the other research areas,such as image data,has also achieved a similar improvement.
作者
周丰丰
张亦弛
ZHOU Feng-feng;ZHANG Yi-chi(College of Computer Science and Technology,Jilin University,Changchun 130012,China;Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education,Jilin University,Changchun 130012,China)
出处
《吉林大学学报(工学版)》
EI
CAS
CSCD
北大核心
2022年第7期1645-1656,共12页
Journal of Jilin University:Engineering and Technology Edition
基金
国家自然科学基金项目(U19A2061)
吉林省教育厅基金项目(JJKH20180145KJ)。