The identification of hepatitis C virus(HCV)virus-human protein interactions will not only help us understand the molecular mechanisms of related diseases but also be conductive to discovering new drug targets.An incr...The identification of hepatitis C virus(HCV)virus-human protein interactions will not only help us understand the molecular mechanisms of related diseases but also be conductive to discovering new drug targets.An increasing number of clinically and experimentally validated interactions between HCV and human proteins have been documented in public databases,facilitating studies based on computational methods.In this study,we proposed a new computational approach,rotation forest position-specific scoring matrix(RF-PSSM),to predict the interactions among HCV and human proteins.In particular,PSSM was used to characterize each protein,two-dimensional principal component analysis(2DPCA)was then adopted for feature extraction of PSSM.Finally,rotation forest(RF)was used to implement classification.The results of various ablation experiments show that on independent datasets,the accuracy and area under curve(AUC)value of RF-PSSM can reach 93.74% and 94.29%,respectively,outperforming almost all cutting-edge research.In addition,we used RF-PSSM to predict 9 human proteins that may interact with HCV protein E1,which can provide theoretical guidance for future experimental studies.展开更多
The increasing amount of sequences stored in genomic databases has become unfeasible to the sequential analysis. Then, the parallel computing brought its power to the Bioinformatics through parallel algorithms to alig...The increasing amount of sequences stored in genomic databases has become unfeasible to the sequential analysis. Then, the parallel computing brought its power to the Bioinformatics through parallel algorithms to align and analyze the sequences, providing improvements mainly in the running time of these algorithms. In many situations, the parallel strategy contributes to reducing the computational complexity of the big problems. This work shows some results obtained by an implementation of a parallel score estimating technique for the score matrix calculation stage, which is the first stage of a progressive multiple sequence alignment. The performance and quality of the parallel score estimating are compared with the results of a dynamic programming approach also implemented in parallel. This comparison shows a significant reduction of running time. Moreover, the quality of the final alignment, using the new strategy, is analyzed and compared with the quality of the approach with dynamic programming.展开更多
The class of multiple attribute decision making (MADM) problems is studied, where the attribute values are intuitionistic fuzzy numbers, and the information about attribute weights is completely unknown. A score fun...The class of multiple attribute decision making (MADM) problems is studied, where the attribute values are intuitionistic fuzzy numbers, and the information about attribute weights is completely unknown. A score function is first used to calculate the score of each attribute value and a score matrix is constructed, and then it is transformed into a normalized score matrix. Based on the normalized score matrix, an entropy-based procedure is proposed to derive attribute weights. Furthermore, the additive weighted averaging operator is utilized to fuse all the normalized scores into the overall scores of alternatives, by which the ranking of all the given alternatives is obtained. This paper is concluded by extending the above results to interval-valued intuitionistic fuzzy set theory, and an illustrative example is also provided.展开更多
Active Motif Finder (AMF) is a novel algorithmic tool, designed based on mutations in DNA sequences. Tools available at present for finding motifs are based on matching a given motif in the query sequence. AMF descr...Active Motif Finder (AMF) is a novel algorithmic tool, designed based on mutations in DNA sequences. Tools available at present for finding motifs are based on matching a given motif in the query sequence. AMF describes a new algorithm that identifies the occurrences of patterns which possess all kinds of mutations like insertion, deletion and mismatch. The algorithm is mainly based on the Alignment Score Matrix (ASM) computation by com paring input motif with full length sequence. Much of the effort in bioinformatics is directed to identify these motifs in the sequences of newly discovered genes. The proposed bio-tool serves as an open resource for analysis and useful for studying polymorphisms in DNA sequences. AMF can be searched via a user-friendly interface. This tool is intended to serve the scientific community working in the areas of chemical and structural biology, and is freely available to all users, at http://www.sastra.edu/scbt/amf/.展开更多
The number and arrangement of subunits that form a protein are referred to as quaternary structure.Knowing the quaternary structure of an uncharacterized protein provides clues to finding its biological function and i...The number and arrangement of subunits that form a protein are referred to as quaternary structure.Knowing the quaternary structure of an uncharacterized protein provides clues to finding its biological function and interaction process with other molecules in a biological system.With the explosion of protein sequences generated in the Post-Genomic Age,it is vital to develop an automated method to deal with such a challenge.To explore this prob-lem,we adopted an approach based on the pseudo position-specific score matrix(Pse-PSSM)descriptor,proposed by Chou and Shen,representing a protein sample.The Pse-PSSM descriptor is advantageous in that it can combine the evolution information and sequence-correlated informa-tion.However,incorporating all these effects into a descriptor may cause‘high dimension disaster’.To over-come such a problem,the fusion approach was adopted by Chou and Shen.A completely different approach,linear dimensionality reduction algorithm principal component analysis(PCA)is introduced to extract key features from the high-dimensional Pse-PSSM space.The obtained dimension-reduced descriptor vector is a compact repre-sentation of the original high dimensional vector.The jack-knife test results indicate that the dimensionality reduction approach is efficient in coping with complicated problems in biological systems,such as predicting the quaternary struc-ture of proteins.展开更多
文摘The identification of hepatitis C virus(HCV)virus-human protein interactions will not only help us understand the molecular mechanisms of related diseases but also be conductive to discovering new drug targets.An increasing number of clinically and experimentally validated interactions between HCV and human proteins have been documented in public databases,facilitating studies based on computational methods.In this study,we proposed a new computational approach,rotation forest position-specific scoring matrix(RF-PSSM),to predict the interactions among HCV and human proteins.In particular,PSSM was used to characterize each protein,two-dimensional principal component analysis(2DPCA)was then adopted for feature extraction of PSSM.Finally,rotation forest(RF)was used to implement classification.The results of various ablation experiments show that on independent datasets,the accuracy and area under curve(AUC)value of RF-PSSM can reach 93.74% and 94.29%,respectively,outperforming almost all cutting-edge research.In addition,we used RF-PSSM to predict 9 human proteins that may interact with HCV protein E1,which can provide theoretical guidance for future experimental studies.
文摘The increasing amount of sequences stored in genomic databases has become unfeasible to the sequential analysis. Then, the parallel computing brought its power to the Bioinformatics through parallel algorithms to align and analyze the sequences, providing improvements mainly in the running time of these algorithms. In many situations, the parallel strategy contributes to reducing the computational complexity of the big problems. This work shows some results obtained by an implementation of a parallel score estimating technique for the score matrix calculation stage, which is the first stage of a progressive multiple sequence alignment. The performance and quality of the parallel score estimating are compared with the results of a dynamic programming approach also implemented in parallel. This comparison shows a significant reduction of running time. Moreover, the quality of the final alignment, using the new strategy, is analyzed and compared with the quality of the approach with dynamic programming.
基金supported by the National Science Fund for Distinguished Young Scholars of China(70625005).
文摘The class of multiple attribute decision making (MADM) problems is studied, where the attribute values are intuitionistic fuzzy numbers, and the information about attribute weights is completely unknown. A score function is first used to calculate the score of each attribute value and a score matrix is constructed, and then it is transformed into a normalized score matrix. Based on the normalized score matrix, an entropy-based procedure is proposed to derive attribute weights. Furthermore, the additive weighted averaging operator is utilized to fuse all the normalized scores into the overall scores of alternatives, by which the ranking of all the given alternatives is obtained. This paper is concluded by extending the above results to interval-valued intuitionistic fuzzy set theory, and an illustrative example is also provided.
文摘Active Motif Finder (AMF) is a novel algorithmic tool, designed based on mutations in DNA sequences. Tools available at present for finding motifs are based on matching a given motif in the query sequence. AMF describes a new algorithm that identifies the occurrences of patterns which possess all kinds of mutations like insertion, deletion and mismatch. The algorithm is mainly based on the Alignment Score Matrix (ASM) computation by com paring input motif with full length sequence. Much of the effort in bioinformatics is directed to identify these motifs in the sequences of newly discovered genes. The proposed bio-tool serves as an open resource for analysis and useful for studying polymorphisms in DNA sequences. AMF can be searched via a user-friendly interface. This tool is intended to serve the scientific community working in the areas of chemical and structural biology, and is freely available to all users, at http://www.sastra.edu/scbt/amf/.
基金supported by the National Natural Science Foundation of China(Grant No.60704047).
文摘The number and arrangement of subunits that form a protein are referred to as quaternary structure.Knowing the quaternary structure of an uncharacterized protein provides clues to finding its biological function and interaction process with other molecules in a biological system.With the explosion of protein sequences generated in the Post-Genomic Age,it is vital to develop an automated method to deal with such a challenge.To explore this prob-lem,we adopted an approach based on the pseudo position-specific score matrix(Pse-PSSM)descriptor,proposed by Chou and Shen,representing a protein sample.The Pse-PSSM descriptor is advantageous in that it can combine the evolution information and sequence-correlated informa-tion.However,incorporating all these effects into a descriptor may cause‘high dimension disaster’.To over-come such a problem,the fusion approach was adopted by Chou and Shen.A completely different approach,linear dimensionality reduction algorithm principal component analysis(PCA)is introduced to extract key features from the high-dimensional Pse-PSSM space.The obtained dimension-reduced descriptor vector is a compact repre-sentation of the original high dimensional vector.The jack-knife test results indicate that the dimensionality reduction approach is efficient in coping with complicated problems in biological systems,such as predicting the quaternary struc-ture of proteins.