The increasing emergence of the time-series single-cell RNA sequencing(scRNA-seq)data,inferring developmental trajectory by connecting transcriptome similar cell states(i.e.,cell types or clusters)has become a major c...The increasing emergence of the time-series single-cell RNA sequencing(scRNA-seq)data,inferring developmental trajectory by connecting transcriptome similar cell states(i.e.,cell types or clusters)has become a major challenge.Most existing computational methods are designed for individual cells and do not take into account the available time series information.We present IDTI based on the Increment of Diversity for Trajectory Inference,which combines time series information and the minimum increment of diversity method to infer cell state trajectory of time-series scRNA-seq data.We apply IDTI to simulated and three real diverse tissue development datasets,and compare it with six other commonly used trajectory inference methods in terms of topology similarity and branching accuracy.The results have shown that the IDTI method accurately constructs the cell state trajectory without the requirement of starting cells.In the performance test,we further demonstrate that IDTI has the advantages of high accuracy and strong robustness.展开更多
Based on the concept of the pseudo amino acid composition (PseAAC), protein structural classes are predicted by using an approach of increment of diversity combined with support vector machine (ID-SVM), in which t...Based on the concept of the pseudo amino acid composition (PseAAC), protein structural classes are predicted by using an approach of increment of diversity combined with support vector machine (ID-SVM), in which the dipeptide amino acid composition of proteins is used as the source of diversity. Jackknife test shows that total prediction accuracy is 96.6% and higher than that given by other approaches. Besides, the specificity (Sp) and the Matthew's correlation coefficient (MCC) are also calculated for each protein structural class, the Sp is more than 88%, the MCC is higher than 92%, and the higher MCC and Sp imply that it is credible to use ID-SVM model predicting protein structural class. The results indicate that: 1 the choice of the source of diversity is reasonable, 2 the predictive performance of IDSVM is excellent, and3 the amino acid sequences of proteins contain information of protein structural classes.展开更多
In this paper, we first combine tetra-peptide structural words with contact number for protein secondary structure prediction. We used the method of increment of diversity combined with quadratic discriminant analysis...In this paper, we first combine tetra-peptide structural words with contact number for protein secondary structure prediction. We used the method of increment of diversity combined with quadratic discriminant analysis to predict the structure of central residue for a sequence fragment. The method is used tetra-peptide structural words and long- range contact number as information resources. The accuracy of Q3 is over 83% in 194 proteins. The accuracies of predicted secondary structures for 20 amino acid residues are ranged from 81% to 88%. Moreover, we have introduced the residue long-range contact, which directly indicates the separation of contacting residue in terms of the position in the sequence, and examined the negative influence of long-range residue interactions on predicting secondary structure in a protein. The method is also compared with existing prediction methods. The results show that our method is more effective in protein secondary structures prediction.展开更多
基金the National Natural Science Foundation of China(62061034,62171241)the key technology research program of Inner Mongolia Autonomous Region(2021GG0398)the Science and Technology Leading Talent Team in Inner Mongolia Autonomous Region(2022LJRC0009).
文摘The increasing emergence of the time-series single-cell RNA sequencing(scRNA-seq)data,inferring developmental trajectory by connecting transcriptome similar cell states(i.e.,cell types or clusters)has become a major challenge.Most existing computational methods are designed for individual cells and do not take into account the available time series information.We present IDTI based on the Increment of Diversity for Trajectory Inference,which combines time series information and the minimum increment of diversity method to infer cell state trajectory of time-series scRNA-seq data.We apply IDTI to simulated and three real diverse tissue development datasets,and compare it with six other commonly used trajectory inference methods in terms of topology similarity and branching accuracy.The results have shown that the IDTI method accurately constructs the cell state trajectory without the requirement of starting cells.In the performance test,we further demonstrate that IDTI has the advantages of high accuracy and strong robustness.
基金Supported by the National Natural Science Foundation of China (30660044)
文摘Based on the concept of the pseudo amino acid composition (PseAAC), protein structural classes are predicted by using an approach of increment of diversity combined with support vector machine (ID-SVM), in which the dipeptide amino acid composition of proteins is used as the source of diversity. Jackknife test shows that total prediction accuracy is 96.6% and higher than that given by other approaches. Besides, the specificity (Sp) and the Matthew's correlation coefficient (MCC) are also calculated for each protein structural class, the Sp is more than 88%, the MCC is higher than 92%, and the higher MCC and Sp imply that it is credible to use ID-SVM model predicting protein structural class. The results indicate that: 1 the choice of the source of diversity is reasonable, 2 the predictive performance of IDSVM is excellent, and3 the amino acid sequences of proteins contain information of protein structural classes.
文摘In this paper, we first combine tetra-peptide structural words with contact number for protein secondary structure prediction. We used the method of increment of diversity combined with quadratic discriminant analysis to predict the structure of central residue for a sequence fragment. The method is used tetra-peptide structural words and long- range contact number as information resources. The accuracy of Q3 is over 83% in 194 proteins. The accuracies of predicted secondary structures for 20 amino acid residues are ranged from 81% to 88%. Moreover, we have introduced the residue long-range contact, which directly indicates the separation of contacting residue in terms of the position in the sequence, and examined the negative influence of long-range residue interactions on predicting secondary structure in a protein. The method is also compared with existing prediction methods. The results show that our method is more effective in protein secondary structures prediction.