摘要
采用分层聚类和熵评价方法进行基于功率谱的蛋白质序列特征提取新方法研究。具体包含以下3个内容:首先,基于经典的HP模型给出了氨基酸序列的数值序列表达;其次,采用离散傅里叶变换方法获取蛋白质序列的特征频谱,构造12维特征向量;最后,利用分层聚类法获取蛋白质序列的分层结构。这种新方法将基于功率谱的DNA序列特征提取方法推广到蛋白质序列上。通过基于19条动物线粒体脱氢酶亚基1和亚基4,以及11条β珠蛋白等3组数据的分层结构比较实验,结果表明,新方法在数据系统的分层结构的信息提取上优于基于功率谱的DNA序列分析方法。因此,新方法对确定未知基因的结构与功能有重要的生物意义。
Based on the power spectrum,a new way for extracting the protein sequences feature was proposed by applying the hierarchical clustering and entropy evaluation.It contained the following three main parts.Firstly,the numerical expression of amino acid sequences was given by the classical HP model.Then,the characteristic spectrum of protein sequence was obtained by using the discrete Fourier transform,and a 12-dimensional feature vector was constructed to represent the protein sequence spectral.Finally,the hierarchical clustering method was used to obtain the structure of protein sequences.The way is a new extension from DNA sequence to the protein sequence.By testing and comparing on three sets of data,their hierarchical structures shown that the new method is better than the DNA sequence analysis method based on power spectrum for extracting the structural information of the data system.This method has important biological significance in determining the structure and function of the unknown genes.
作者
梁启浩
李阳
唐旭清
LIANG Qihao;Li Yang;TANG Xuqing(School of Science,Jiangnan University,Wuxi 214122,China)
出处
《食品与生物技术学报》
CAS
CSCD
北大核心
2018年第11期1160-1165,共6页
Journal of Food Science and Biotechnology
基金
国家自然科学基金项目(11371174)
江苏省普通高校研究生科研创新计划项目(KYLX15_1188)
关键词
DNA序列
功率谱
分层聚类
蛋白质序列
熵
DNA sequence
power spectrum
hierarchical clustering
protein sequence
entropy