面向信息SNP选择的聚类算法

Clustering Algorithm for Information SNP Selection

下载PDF

导出

摘要 SNP数据在人类遗传病诊断与治疗中存在重要作用,但SNP原始数据存在大量冗余,因此需要选择出信息量大的SNP,完成SNP数据的降维。针对常用聚类算法应用到信息SNP选择时未考虑单个SNP与SNP子集之间相似度的问题,采用一种新的相似度度量方法,提出了一种改进的聚类算法K-MIGS,并将其应用到SNP选择中。K-MIGS算法解决了传统K-means不能挖掘出SNP位点与SNP子集之间的强相关性问题,并在医院提供的临床数据实验中表明,K-MIGS具有更高的非信息SNP子集重构度。最后使用支持向量机、决策树和神经网络对构造的SNP子集进行分类实验,对比K-means、特征加权K-means、ReliefF和MCMR,结果表明K-MIGS分类准确率和F1指标上提升了10%和15%,充分说明K-MIGS在信息SNP选择中具有更好的效果。 SNP data plays an important role in the diagnosis and treatment of human genetic diseases,but the SNP raw data has a lot of redundancy.Therefore,it is necessary to select a SNP with a large amount of information to complete the dimensional reduction of SNP data.Aiming at the problem that the common clustering algorithm is applied to the information SNP selection without considering the similarity between single SNP and SNP subset,a new similarity measure method is proposed,and an improved clustering algorithm K-MIGS is proposed.Apply it to the SNP selection.The K-MIGS algorithm solves the problem that the traditional K-means cannot uncover the strong correlation between the SNP locus and the SNP subset,and shows that the K-MIGS has higher non-information SNPs in the clinical data experiments provided by the hospital.Set the degree of reconstruction.Finally,using SVM,decision tree and neural network to classify the constructed SNP subsets,compare K-means,feature weighted K-means,ReliefF and MCMR,the results show that K-MIGS classification accuracy and F1 indicators have improved.10%and 15%,fully demonstrating that K-MIGS has a better effect in information SNP selection.

作者邢斌周从华张付全张婷蒋跃明 XING Bin;ZHOU Conghua;ZHANG Fuquan;ZHANG Ting;JIANG Yueming(School of Computer Science and Telecommunication Engineering,Jiangsu University,Zhenjiang 212013;Wuxi Mental Health Center,Wuxi 214151;Wuxi MCH Hospital,Wuxi 214002;Wuxi No.5 People's Hospital,Wuxi 214073)

机构地区江苏大学计算机科学与通信工程学院无锡市精神卫生中心无锡市妇幼保健院无锡市第五人民医院

出处《计算机与数字工程》 2021年第10期1983-1987,2008,共6页 Computer & Digital Engineering

基金江苏省重点研发计划(社会发展)项目(编号:BE2016630,BE2017628) 无锡市卫生计生委科研项目(编号:z201603)资助。

关键词单核苷酸多态性 SNP选择相似度度量 K-MEANS single nucleotide polymorphism SNP selection similarity measure K-means

分类号 TP391 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

1郑秋梅,温阳,王风华.基于多卷积核通道特征加权双目立体匹配算法[J].计算机与数字工程,2021,49(10):2113-2117. 被引量：2
2沈旭明.倾向性归属分析的比较研究[J].自然辩证法研究,2021,37(9):115-121. 被引量：1
3孙聪慧,姜合,相益萱.非独立同分布下数值型数据的KNN算法改进[J].计算机工程与设计,2021,42(10):2816-2822. 被引量：2
4张继超,邹勇,宋伟东,张永红,李建飞.联合对称不确定性ReliefF算法的PolSAR影像分类[J].遥感信息,2021,36(4):20-27.
5袁宝红,卢宇,胡婷芳.基于自适应Lasso流形规整的特征提取算法研究[J].湖南文理学院学报（自然科学版）,2021,33(4):23-26.
6张飞兵,秦振振,朱丽丹.颠簸路面对双离合器变速器齿轮敲击噪声的影响研究[J].汽车科技,2021(5):11-16.
7王帅,郭月凯,屈少辉,朱愈欢,王灿,陈宇昕.基于量子遗传算法的作战油料调运优化[J].舰船电子工程,2021,41(10):121-125. 被引量：1

计算机与数字工程

2021年第10期

浏览历史

内容加载中请稍等...

面向信息SNP选择的聚类算法

相关作者

相关机构

相关主题

浏览历史