摘要
作为第三代遗传标记的单核苷酸多态性(SNP)具有数量众多、分布广泛且遗传稳定性等特点,其是疾病-基因相关性以及药物设计等研究的基础所在。这类研究多采用基于计算的方法,因此如何对SNP进行适当的编码进而提升算法的性能是其中十分关键的一个环节,然而目前专门针对SNP编码问题的研究还相对较少。在常用SNP表示方式的基础上,根据疾病易感性研究的特点,并结合SNP之间的关联性,提出了几种新的编码方法。大量实验表明,编码方式对疾病易感性分析算法的性能有着较大的影响,基于分布信息的编码方法能获得更好的结果,即其能更好地对SNP序列进行描述,在最大程度上保留原有生物序列所携带的丰富信息,更适合于疾病易感性研究。
Due to the SNP has some characteristics(such as high abundance and low mutation rate),they are suitable for disease association studies.Lots of those studies were based on calculated methods,so encoding the SNP to enhance the performance of disease associated analysis algorithm was critical aspect.However,few of studies were dedicated to that issue.Therefore,based on common SNP encoding method and association between them,we proposed several new encoding methods.The experiments results show that encoding methods has a greater impact on algorithm performance,and the methods described herein are better than others.Namely,the encoding methods proposed in this paper are better to describe the SNP sequence and retain the original biological sequence information,and are more suitable for disease susceptibility research.
出处
《计算机科学》
CSCD
北大核心
2016年第S1期219-221 235,235,共4页
Computer Science
基金
陕西省教育厅科研计划项目(15JK2187)
西京学院科研基金项目(XJ140115)
武警工程大学基础研究基金项目(WJY201518)资助
关键词
单核苷酸多态性
编码
疾病易感性
Single nucleotide polymorphism
Encoding
Disease susceptibility