The identification of hepatitis C virus(HCV)virus-human protein interactions will not only help us understand the molecular mechanisms of related diseases but also be conductive to discovering new drug targets.An incr...The identification of hepatitis C virus(HCV)virus-human protein interactions will not only help us understand the molecular mechanisms of related diseases but also be conductive to discovering new drug targets.An increasing number of clinically and experimentally validated interactions between HCV and human proteins have been documented in public databases,facilitating studies based on computational methods.In this study,we proposed a new computational approach,rotation forest position-specific scoring matrix(RF-PSSM),to predict the interactions among HCV and human proteins.In particular,PSSM was used to characterize each protein,two-dimensional principal component analysis(2DPCA)was then adopted for feature extraction of PSSM.Finally,rotation forest(RF)was used to implement classification.The results of various ablation experiments show that on independent datasets,the accuracy and area under curve(AUC)value of RF-PSSM can reach 93.74% and 94.29%,respectively,outperforming almost all cutting-edge research.In addition,we used RF-PSSM to predict 9 human proteins that may interact with HCV protein E1,which can provide theoretical guidance for future experimental studies.展开更多
The increasing amount of sequences stored in genomic databases has become unfeasible to the sequential analysis. Then, the parallel computing brought its power to the Bioinformatics through parallel algorithms to alig...The increasing amount of sequences stored in genomic databases has become unfeasible to the sequential analysis. Then, the parallel computing brought its power to the Bioinformatics through parallel algorithms to align and analyze the sequences, providing improvements mainly in the running time of these algorithms. In many situations, the parallel strategy contributes to reducing the computational complexity of the big problems. This work shows some results obtained by an implementation of a parallel score estimating technique for the score matrix calculation stage, which is the first stage of a progressive multiple sequence alignment. The performance and quality of the parallel score estimating are compared with the results of a dynamic programming approach also implemented in parallel. This comparison shows a significant reduction of running time. Moreover, the quality of the final alignment, using the new strategy, is analyzed and compared with the quality of the approach with dynamic programming.展开更多
How to quickly and accurately detect new topics from massive data online becomes a main problem of public opinion monitoring in cyberspace. This paperpresents a new event detection method for the current new event det...How to quickly and accurately detect new topics from massive data online becomes a main problem of public opinion monitoring in cyberspace. This paperpresents a new event detection method for the current new event detection system, based on sorted subtopic matching algorithm and constructs the entire design framework. In this p^per, the subtopics contained in old topics (or news stories) are sorted in descending order according to their importance to the topic(or news stories), and form a sorted subtopic sequence. In the process of subtopic matching, subtopic scoring matrix is used to determine whether a new story is reporting a new event. Experimental results show that the sorted subtopic matching model improved the accuracy and effectiveness ofthenew event detection system in cyberspace.展开更多
The class of multiple attribute decision making (MADM) problems is studied, where the attribute values are intuitionistic fuzzy numbers, and the information about attribute weights is completely unknown. A score fun...The class of multiple attribute decision making (MADM) problems is studied, where the attribute values are intuitionistic fuzzy numbers, and the information about attribute weights is completely unknown. A score function is first used to calculate the score of each attribute value and a score matrix is constructed, and then it is transformed into a normalized score matrix. Based on the normalized score matrix, an entropy-based procedure is proposed to derive attribute weights. Furthermore, the additive weighted averaging operator is utilized to fuse all the normalized scores into the overall scores of alternatives, by which the ranking of all the given alternatives is obtained. This paper is concluded by extending the above results to interval-valued intuitionistic fuzzy set theory, and an illustrative example is also provided.展开更多
Active Motif Finder (AMF) is a novel algorithmic tool, designed based on mutations in DNA sequences. Tools available at present for finding motifs are based on matching a given motif in the query sequence. AMF descr...Active Motif Finder (AMF) is a novel algorithmic tool, designed based on mutations in DNA sequences. Tools available at present for finding motifs are based on matching a given motif in the query sequence. AMF describes a new algorithm that identifies the occurrences of patterns which possess all kinds of mutations like insertion, deletion and mismatch. The algorithm is mainly based on the Alignment Score Matrix (ASM) computation by com paring input motif with full length sequence. Much of the effort in bioinformatics is directed to identify these motifs in the sequences of newly discovered genes. The proposed bio-tool serves as an open resource for analysis and useful for studying polymorphisms in DNA sequences. AMF can be searched via a user-friendly interface. This tool is intended to serve the scientific community working in the areas of chemical and structural biology, and is freely available to all users, at http://www.sastra.edu/scbt/amf/.展开更多
Successful prediction of protein domain boundaries provides valuable information not only for the computational structure prediction of muhi-domain proteins but also for the experimental structure determination. A nov...Successful prediction of protein domain boundaries provides valuable information not only for the computational structure prediction of muhi-domain proteins but also for the experimental structure determination. A novel method for domain boundary prediction has been presented, which combines the support vector machine with domain guess by size algorithm. Since the evolutional information of multiple domains can be detected by position specific score matrix, the support vector machine method is trained and tested using the values of position specific score matrix generated by PSI-BLAST. The candidate domain boundaries are selected from the output of support vector machine, and are then inputted to domain guess by size algorithm to give the final results of domain boundary, prediction. The experimental results show that the combined method outperforms the individual method of both support vector machine and domain guess by size.展开更多
The number and arrangement of subunits that form a protein are referred to as quaternary structure.Knowing the quaternary structure of an uncharacterized protein provides clues to finding its biological function and i...The number and arrangement of subunits that form a protein are referred to as quaternary structure.Knowing the quaternary structure of an uncharacterized protein provides clues to finding its biological function and interaction process with other molecules in a biological system.With the explosion of protein sequences generated in the Post-Genomic Age,it is vital to develop an automated method to deal with such a challenge.To explore this prob-lem,we adopted an approach based on the pseudo position-specific score matrix(Pse-PSSM)descriptor,proposed by Chou and Shen,representing a protein sample.The Pse-PSSM descriptor is advantageous in that it can combine the evolution information and sequence-correlated informa-tion.However,incorporating all these effects into a descriptor may cause‘high dimension disaster’.To over-come such a problem,the fusion approach was adopted by Chou and Shen.A completely different approach,linear dimensionality reduction algorithm principal component analysis(PCA)is introduced to extract key features from the high-dimensional Pse-PSSM space.The obtained dimension-reduced descriptor vector is a compact repre-sentation of the original high dimensional vector.The jack-knife test results indicate that the dimensionality reduction approach is efficient in coping with complicated problems in biological systems,such as predicting the quaternary struc-ture of proteins.展开更多
文摘The identification of hepatitis C virus(HCV)virus-human protein interactions will not only help us understand the molecular mechanisms of related diseases but also be conductive to discovering new drug targets.An increasing number of clinically and experimentally validated interactions between HCV and human proteins have been documented in public databases,facilitating studies based on computational methods.In this study,we proposed a new computational approach,rotation forest position-specific scoring matrix(RF-PSSM),to predict the interactions among HCV and human proteins.In particular,PSSM was used to characterize each protein,two-dimensional principal component analysis(2DPCA)was then adopted for feature extraction of PSSM.Finally,rotation forest(RF)was used to implement classification.The results of various ablation experiments show that on independent datasets,the accuracy and area under curve(AUC)value of RF-PSSM can reach 93.74% and 94.29%,respectively,outperforming almost all cutting-edge research.In addition,we used RF-PSSM to predict 9 human proteins that may interact with HCV protein E1,which can provide theoretical guidance for future experimental studies.
文摘The increasing amount of sequences stored in genomic databases has become unfeasible to the sequential analysis. Then, the parallel computing brought its power to the Bioinformatics through parallel algorithms to align and analyze the sequences, providing improvements mainly in the running time of these algorithms. In many situations, the parallel strategy contributes to reducing the computational complexity of the big problems. This work shows some results obtained by an implementation of a parallel score estimating technique for the score matrix calculation stage, which is the first stage of a progressive multiple sequence alignment. The performance and quality of the parallel score estimating are compared with the results of a dynamic programming approach also implemented in parallel. This comparison shows a significant reduction of running time. Moreover, the quality of the final alignment, using the new strategy, is analyzed and compared with the quality of the approach with dynamic programming.
基金Funded by the Planning Project of National Language Committee in the "12th 5-year Plan"(No.YB125-49)the Foundation for Key Program of Ministry of Education,China(No.212167)the Fundamental Research Funds for the Central Universities(No.SWJTU12CX096)
文摘How to quickly and accurately detect new topics from massive data online becomes a main problem of public opinion monitoring in cyberspace. This paperpresents a new event detection method for the current new event detection system, based on sorted subtopic matching algorithm and constructs the entire design framework. In this p^per, the subtopics contained in old topics (or news stories) are sorted in descending order according to their importance to the topic(or news stories), and form a sorted subtopic sequence. In the process of subtopic matching, subtopic scoring matrix is used to determine whether a new story is reporting a new event. Experimental results show that the sorted subtopic matching model improved the accuracy and effectiveness ofthenew event detection system in cyberspace.
基金supported by the National Science Fund for Distinguished Young Scholars of China(70625005).
文摘The class of multiple attribute decision making (MADM) problems is studied, where the attribute values are intuitionistic fuzzy numbers, and the information about attribute weights is completely unknown. A score function is first used to calculate the score of each attribute value and a score matrix is constructed, and then it is transformed into a normalized score matrix. Based on the normalized score matrix, an entropy-based procedure is proposed to derive attribute weights. Furthermore, the additive weighted averaging operator is utilized to fuse all the normalized scores into the overall scores of alternatives, by which the ranking of all the given alternatives is obtained. This paper is concluded by extending the above results to interval-valued intuitionistic fuzzy set theory, and an illustrative example is also provided.
文摘Active Motif Finder (AMF) is a novel algorithmic tool, designed based on mutations in DNA sequences. Tools available at present for finding motifs are based on matching a given motif in the query sequence. AMF describes a new algorithm that identifies the occurrences of patterns which possess all kinds of mutations like insertion, deletion and mismatch. The algorithm is mainly based on the Alignment Score Matrix (ASM) computation by com paring input motif with full length sequence. Much of the effort in bioinformatics is directed to identify these motifs in the sequences of newly discovered genes. The proposed bio-tool serves as an open resource for analysis and useful for studying polymorphisms in DNA sequences. AMF can be searched via a user-friendly interface. This tool is intended to serve the scientific community working in the areas of chemical and structural biology, and is freely available to all users, at http://www.sastra.edu/scbt/amf/.
基金Supported by the National Natural Science Foundation of China (No. 60435020)
文摘Successful prediction of protein domain boundaries provides valuable information not only for the computational structure prediction of muhi-domain proteins but also for the experimental structure determination. A novel method for domain boundary prediction has been presented, which combines the support vector machine with domain guess by size algorithm. Since the evolutional information of multiple domains can be detected by position specific score matrix, the support vector machine method is trained and tested using the values of position specific score matrix generated by PSI-BLAST. The candidate domain boundaries are selected from the output of support vector machine, and are then inputted to domain guess by size algorithm to give the final results of domain boundary, prediction. The experimental results show that the combined method outperforms the individual method of both support vector machine and domain guess by size.
基金supported by the National Natural Science Foundation of China(Grant No.60704047).
文摘The number and arrangement of subunits that form a protein are referred to as quaternary structure.Knowing the quaternary structure of an uncharacterized protein provides clues to finding its biological function and interaction process with other molecules in a biological system.With the explosion of protein sequences generated in the Post-Genomic Age,it is vital to develop an automated method to deal with such a challenge.To explore this prob-lem,we adopted an approach based on the pseudo position-specific score matrix(Pse-PSSM)descriptor,proposed by Chou and Shen,representing a protein sample.The Pse-PSSM descriptor is advantageous in that it can combine the evolution information and sequence-correlated informa-tion.However,incorporating all these effects into a descriptor may cause‘high dimension disaster’.To over-come such a problem,the fusion approach was adopted by Chou and Shen.A completely different approach,linear dimensionality reduction algorithm principal component analysis(PCA)is introduced to extract key features from the high-dimensional Pse-PSSM space.The obtained dimension-reduced descriptor vector is a compact repre-sentation of the original high dimensional vector.The jack-knife test results indicate that the dimensionality reduction approach is efficient in coping with complicated problems in biological systems,such as predicting the quaternary struc-ture of proteins.