期刊文献+

利用基因组标记和机器学习算法对中国牛品种的分类准确性研究

Classification accuracy of machine learning algorithms for Chinese local cattle breeds using genomic markers
下载PDF
导出
摘要 品种分类是畜禽品种遗传资源保护和利用的基础,传统分类方法主要依赖于体型外貌特征判断,但因分类指标不易量化,故难以区分相似度较高的品种。机器学习算法在利用基因组信息进行品种分类方面显示出独特优势。为了探索最适合于中国牛品种的分类方法,本研究使用7个地方品种共213头牛的基因组SNP数据,对比了F_(ST)值排序筛选、mRMR、Relief-F三种SNP选择方法和随机森林(Random Forest,RF)、支持向量机(Support Vector Machine,SVM)、朴素贝叶斯(Naive Byes,NB)三种不同机器学习算法对品种分类准确性的影响。结果表明:1)使用F_(ST)方法筛选1500个以上SNP,或使用mRMR算法筛选1000个以上SNP,SVM分类算法可以达到99.47%以上的分类准确率;2)分类效果最好的算法是SVM算法,其次是NB算法,而最好的SNP选择方法是F_(ST)和mRMR算法,其次是Relief-F;3)品种错误归类情况常出现在相似性较高的品种间。本研究显示机器学习分类模型结合基因组数据是对牛地方品种鉴别的有效方法,为我国牛品种的快速准确分类提供了技术依据。 Accurate breed classification is required for the conservation and utilization of farm animal genetic resources.Traditional classification methods mainly rely on phenotypic characterization.However,it is difficult to distinguish between the highly similar breeds due to the challenges in qualifying the phenotypic character.Machine learning algorithms show unique advantages in breed classification using genomic information.To evaluate the classification methods for Chinese cattle breeds,this study utilized genomic SNP data from 213 individuals across seven Chinese local breeds and compared the classification accuracies of three feature selection methods(F_(ST) value sorting and screening,mRMR,and Relief-F)and three machine learning algorithms(Random Forest,Support Vector Machine,and Naive Bayes).Results showed that:1)using the F_(ST) method to screen more than 1500 SNPs,or using the mRMR algorithm to screen more than 1000 SNPs,the SVM classification algorithm can achieve more than 99.47%classification accuracy;2)the most effective algorithm was SVM,followed by NB,while the best SNP selection method was F_(ST) and mRMR,followed by Relief-F;3)species misclassification often occurs between breeds with high similarity.This study demonstrates that machine learning classification models combined with genomic data are effective methods for the classification of local cattle breeds,providing a technical basis for the rapid and accurate classification of cattle breeds in China.
作者 梁卉 王雪 司敬方 张毅 Hui Liang;Xue Wang;Jingfang Si;Yi Zhang(College of Animal Science and Technology,China Agricultural University,Beijing 100193,China)
出处 《遗传》 CAS CSCD 北大核心 2024年第7期530-539,共10页 Hereditas(Beijing)
基金 “十四五”国家重点研发计划项目(编号:2021YFD1200904) 财政部和农业农村部:国家现代农业产业技术体系(编号:CARS-36)资助。
关键词 机器学习 品种分类 特征选择 支持向量机 F_(ST) machine learning breed classification feature selection support vector machine F_(ST)
  • 相关文献

参考文献2

二级参考文献13

共引文献175

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部