Protein succinylation is a biochemical reaction in which a succinyl group(-CO-CH2-CH2-CO-)is attached to the lysine residue of a protein molecule.Lysine succinylation plays important regulatory roles in living cells.H...Protein succinylation is a biochemical reaction in which a succinyl group(-CO-CH2-CH2-CO-)is attached to the lysine residue of a protein molecule.Lysine succinylation plays important regulatory roles in living cells.However,studies in this field are limited by the difficulty in experimentally identifying the substrate site specificity of lysine succinylation.To facilitate this process,several tools have been proposed for the computational identification of succinylated lysine sites.In this study,we developed an approach to investigate the substrate specificity of lysine succinylated sites based on amino acid composition.Using experimentally verified lysine succinylated sites collected from public resources,the significant differences in position-specific amino acid composition between succinylated and non-succinylated sites were represented using the Two Sample Logo program.These findings enabled the adoption of an effective machine learning method,support vector machine,to train a predictive model with not only the amino acid composition,but also the composition of k-spaced amino acid pairs.After the selection of the best model using a ten-fold crossvalidation approach,the selected model significantly outperformed existing tools based on an independent dataset manually extracted from published research articles.Finally,the selected model was used to develop a web-based tool,SuccSite,to aid the study of protein succinylation.Two proteins were used as case studies on the website to demonstrate the effective prediction of succinylation sites.We will regularly update SuccSite by integrating more experimental datasets.SuccSite is freely accessible at http://csb.cse.yzu.edu.tw/SuccSite/.展开更多
Subcellular localization is an important feature of proteins which is closely correlated to their function. In this work,we tried to develop a new coding method of using those location predictive molecular function te...Subcellular localization is an important feature of proteins which is closely correlated to their function. In this work,we tried to develop a new coding method of using those location predictive molecular function terms of protein as the input for the prediction of subcellular localization. Combined with the amino acid pair composition of the sequence,this coding system is proved to be efficient for support vector machine (SVM) and to have satisfied performance when tested on the RH dataset. Meanwhile,the model also shows robustness against N-terminal uncertainties in sequences.展开更多
基金the Warshel Institute for Computational Biology,School of Life and Health Sciences,The Chinese University of Hong Kong,Shenzhen,China for financially supporting this research
文摘Protein succinylation is a biochemical reaction in which a succinyl group(-CO-CH2-CH2-CO-)is attached to the lysine residue of a protein molecule.Lysine succinylation plays important regulatory roles in living cells.However,studies in this field are limited by the difficulty in experimentally identifying the substrate site specificity of lysine succinylation.To facilitate this process,several tools have been proposed for the computational identification of succinylated lysine sites.In this study,we developed an approach to investigate the substrate specificity of lysine succinylated sites based on amino acid composition.Using experimentally verified lysine succinylated sites collected from public resources,the significant differences in position-specific amino acid composition between succinylated and non-succinylated sites were represented using the Two Sample Logo program.These findings enabled the adoption of an effective machine learning method,support vector machine,to train a predictive model with not only the amino acid composition,but also the composition of k-spaced amino acid pairs.After the selection of the best model using a ten-fold crossvalidation approach,the selected model significantly outperformed existing tools based on an independent dataset manually extracted from published research articles.Finally,the selected model was used to develop a web-based tool,SuccSite,to aid the study of protein succinylation.Two proteins were used as case studies on the website to demonstrate the effective prediction of succinylation sites.We will regularly update SuccSite by integrating more experimental datasets.SuccSite is freely accessible at http://csb.cse.yzu.edu.tw/SuccSite/.
文摘Subcellular localization is an important feature of proteins which is closely correlated to their function. In this work,we tried to develop a new coding method of using those location predictive molecular function terms of protein as the input for the prediction of subcellular localization. Combined with the amino acid pair composition of the sequence,this coding system is proved to be efficient for support vector machine (SVM) and to have satisfied performance when tested on the RH dataset. Meanwhile,the model also shows robustness against N-terminal uncertainties in sequences.