摘要
自动、准确地识别DNA甲基化修饰位点对于研究基因的调控、转录和表达机理,有针对性地开发癌症靶向治疗药物有重要意义.然而,基于核酸频率统计特征和物化属性伪核酸成分统计特征并不能很好地反应DNA甲基化位点的模式信息,所构建的DNA甲基化位点预测器精度也不高.因此,文中提出从3个不同的视角抽取DNA序列上的核酸频次统计信息、位置统计信息和空间结构属性信息,并将其融合为一种新的统计特征向量,然后在相同的基础数据集上采用SVM分类器和严格的Jackknife测试方法进行实验验证.结果表明:该方法构建的预测器较当前最好的iDNA-methyl预测器,在Acc、Mcc和AUC 3个性能指标上分别提高了11.85%、24%和11.3%;该研究表明在DNA甲基化位点预测问题上,核酸序列的频次统计信息、位置统计信息和空间结构属性信息具有较好互补性,这3个视角相融合得到的特征向量能够更好地反映DNA甲基化修饰位点的模式特征,提高DNA甲基化位点的预测精度.
It is of great significance to adopt intelligent computing method to identify DNA methylation sites automatically and accurately, to study the gene regulation, transcription and expression mechanism, and to develop targeted cancer drugs. However, the nucleotide composition feature based on frequency statistics and the pseudo nucleotide composition feature, does not reflect better the pattern information of the DNA methylation site, which are based on physical and chemical properties, and the accuracy of the constructed DNA methylation site predictor is not higher. Therefore, we propose to extract the new fusion statistical feature vectors through frequency statistics, position statistics and spatial structure attribute information of nucleotide on the DNA sequence from three different angles, and verify the dataset using the SVM classifier and rigorous Jackknife test on the same benchmark dataset. The experimental results show that the predictor constructed by the method is superior to the best current iDNA-methyl predictor, which improves by 11.85%, 24% and 11.3% in Acc, Mcc and AUC, respectively. The research shows that the frequency statistics, position statistics and spatial structure attribute information of nucleotide on the DNA sequence are complementary to each other on the DNA methylation site prediction problem. The feature vectors, can better reflect the pattern of DNA methylation sites and improve the prediction accuracy of DNA methylation sites, which are obtained by using the fusion of these three angles.
作者
孙佳伟
张明
王长宝
徐维艳
程科
段先华
SUN Jiawei;ZHANG Ming;WANG Changbao;XU Weiyan;CHENG Ke;DUAN Xianhua(School of Computer Science, Jiangsu University of Science and Technology, Zhenjiang 212003, China;School of Science, Jiangsu University of Science and Technology, Zhenjiang 212003, China)
出处
《江苏科技大学学报(自然科学版)》
CAS
2019年第2期62-68,共7页
Journal of Jiangsu University of Science and Technology:Natural Science Edition
基金
国家自然科学基金资助项目(61572242
61373062)
江苏省自然科学基金资助项目(BK20141403
BK20130472)
江苏省科技支撑项目(BE2014692)