Chloroplast is a type of subcellular organelle in green plants and algae.It is the main subcellular organelle for conducting photosynthetic process.The proteins,which localize within the chloroplast,are responsible fo...Chloroplast is a type of subcellular organelle in green plants and algae.It is the main subcellular organelle for conducting photosynthetic process.The proteins,which localize within the chloroplast,are responsible for the photosynthetic process at molecular level.The chloroplast can be further divided into several compartments.Proteins in different compartments are related to different steps in the photosynthetic process.Since the molecular function of a protein is highly correlated to the exact cellular localization,pinpointing the subchloroplast location of a chloroplast protein is an important step towards the understanding of its role in the photosynthetic process.Experimental process for determining protein subchloroplast location is always costly and time consuming.Therefore,computational approaches were developed to predict the protein subchloroplast locations from the primary sequences.Over the last decades,more than a dozen studies have tried to predict protein subchloroplast locations with machine learning methods.Various sequence features and various machine learning algorithms have been introduced in this research topic.In this review,we collected the comprehensive information of all existing studies regarding the prediction of protein subchloroplast locations.We compare these studies in the aspects of benchmarking datasets,sequence features,machine learning algorithms,predictive performances,and the implementation availability.We summarized the progress and current status in this special research topic.We also try to figure out the most possible future works in predicting protein subchloroplast locations.We hope this review not only list all existing works,but also serve the readers as a useful resource for quickly grasping the big picture of this research topic.We also hope this review work can be a starting point of future methodology studies regarding the prediction of protein subchloroplast locations.展开更多
Chloroplasts are organelles found in plant cells that conduct photosynthesis. The subchloroplast locations of proteins are correlated with their functions. With the availability of a great number of protein data, it i...Chloroplasts are organelles found in plant cells that conduct photosynthesis. The subchloroplast locations of proteins are correlated with their functions. With the availability of a great number of protein data, it is highly desired to develop a com- putational method to predict the subchloroplast locations of chloroplast proteins. In this study, we proposed a novel method to predict subchloroplast locations of proteins using tripeptide compositions. It first used the binomial distribution to optimize the feature sets. Then the support vector machine was selected to perform the prediction of subchloroplast locations of proteins. The proposed method was tested on a reliable and rigorous dataset including 259 chloroplast proteins with sequence identity ≤ 25%. In the jack-knife cross-validation, 92.21% envelope proteins, 93.20% thylakoid mem- brane, 52.63% thylakoid lumen and 85.00% stroma can be correctly identified. The overall accuracy achieves 88.03% which is higher than that of other models. Based on this method, a predictor called ChloPred has been built and can be freely available from http://cobi.uestc.edu.cn/people/hlin/tools/ChloPred/. The predictor will provide important information for theoretical and experimental research of chloroplast proteins.展开更多
基金This work was supported by National Key R&D Program of China(2018YFC0910405),The National Natural Science Foundation of China(NSFC,Grant No.61872268)Open Project Funding of CAS Key Lab of Network Data Science and Technology,Institute of Computing Technology,Chinese Academy of Sciences(CASNDST201705).
文摘Chloroplast is a type of subcellular organelle in green plants and algae.It is the main subcellular organelle for conducting photosynthetic process.The proteins,which localize within the chloroplast,are responsible for the photosynthetic process at molecular level.The chloroplast can be further divided into several compartments.Proteins in different compartments are related to different steps in the photosynthetic process.Since the molecular function of a protein is highly correlated to the exact cellular localization,pinpointing the subchloroplast location of a chloroplast protein is an important step towards the understanding of its role in the photosynthetic process.Experimental process for determining protein subchloroplast location is always costly and time consuming.Therefore,computational approaches were developed to predict the protein subchloroplast locations from the primary sequences.Over the last decades,more than a dozen studies have tried to predict protein subchloroplast locations with machine learning methods.Various sequence features and various machine learning algorithms have been introduced in this research topic.In this review,we collected the comprehensive information of all existing studies regarding the prediction of protein subchloroplast locations.We compare these studies in the aspects of benchmarking datasets,sequence features,machine learning algorithms,predictive performances,and the implementation availability.We summarized the progress and current status in this special research topic.We also try to figure out the most possible future works in predicting protein subchloroplast locations.We hope this review not only list all existing works,but also serve the readers as a useful resource for quickly grasping the big picture of this research topic.We also hope this review work can be a starting point of future methodology studies regarding the prediction of protein subchloroplast locations.
文摘Chloroplasts are organelles found in plant cells that conduct photosynthesis. The subchloroplast locations of proteins are correlated with their functions. With the availability of a great number of protein data, it is highly desired to develop a com- putational method to predict the subchloroplast locations of chloroplast proteins. In this study, we proposed a novel method to predict subchloroplast locations of proteins using tripeptide compositions. It first used the binomial distribution to optimize the feature sets. Then the support vector machine was selected to perform the prediction of subchloroplast locations of proteins. The proposed method was tested on a reliable and rigorous dataset including 259 chloroplast proteins with sequence identity ≤ 25%. In the jack-knife cross-validation, 92.21% envelope proteins, 93.20% thylakoid mem- brane, 52.63% thylakoid lumen and 85.00% stroma can be correctly identified. The overall accuracy achieves 88.03% which is higher than that of other models. Based on this method, a predictor called ChloPred has been built and can be freely available from http://cobi.uestc.edu.cn/people/hlin/tools/ChloPred/. The predictor will provide important information for theoretical and experimental research of chloroplast proteins.