摘要
SNP数据在人类遗传病诊断与治疗中存在重要作用,但SNP原始数据存在大量冗余,因此需要选择出信息量大的SNP,完成SNP数据的降维。针对常用聚类算法应用到信息SNP选择时未考虑单个SNP与SNP子集之间相似度的问题,采用一种新的相似度度量方法,提出了一种改进的聚类算法K-MIGS,并将其应用到SNP选择中。K-MIGS算法解决了传统K-means不能挖掘出SNP位点与SNP子集之间的强相关性问题,并在医院提供的临床数据实验中表明,K-MIGS具有更高的非信息SNP子集重构度。最后使用支持向量机、决策树和神经网络对构造的SNP子集进行分类实验,对比K-means、特征加权K-means、ReliefF和MCMR,结果表明K-MIGS分类准确率和F1指标上提升了10%和15%,充分说明K-MIGS在信息SNP选择中具有更好的效果。
SNP data plays an important role in the diagnosis and treatment of human genetic diseases,but the SNP raw data has a lot of redundancy.Therefore,it is necessary to select a SNP with a large amount of information to complete the dimensional reduction of SNP data.Aiming at the problem that the common clustering algorithm is applied to the information SNP selection without considering the similarity between single SNP and SNP subset,a new similarity measure method is proposed,and an improved clustering algorithm K-MIGS is proposed.Apply it to the SNP selection.The K-MIGS algorithm solves the problem that the traditional K-means cannot uncover the strong correlation between the SNP locus and the SNP subset,and shows that the K-MIGS has higher non-information SNPs in the clinical data experiments provided by the hospital.Set the degree of reconstruction.Finally,using SVM,decision tree and neural network to classify the constructed SNP subsets,compare K-means,feature weighted K-means,ReliefF and MCMR,the results show that K-MIGS classification accuracy and F1 indicators have improved.10%and 15%,fully demonstrating that K-MIGS has a better effect in information SNP selection.
作者
邢斌
周从华
张付全
张婷
蒋跃明
XING Bin;ZHOU Conghua;ZHANG Fuquan;ZHANG Ting;JIANG Yueming(School of Computer Science and Telecommunication Engineering,Jiangsu University,Zhenjiang 212013;Wuxi Mental Health Center,Wuxi 214151;Wuxi MCH Hospital,Wuxi 214002;Wuxi No.5 People's Hospital,Wuxi 214073)
出处
《计算机与数字工程》
2021年第10期1983-1987,2008,共6页
Computer & Digital Engineering
基金
江苏省重点研发计划(社会发展)项目(编号:BE2016630,BE2017628)
无锡市卫生计生委科研项目(编号:z201603)资助。