Membrane proteins are an important kind of proteins embedded in the membranes of cells and play crucial roles in living organisms, such as ion channels,transporters, receptors. Because it is difficult to determinate t...Membrane proteins are an important kind of proteins embedded in the membranes of cells and play crucial roles in living organisms, such as ion channels,transporters, receptors. Because it is difficult to determinate the membrane protein's structure by wet-lab experiments,accurate and fast amino acid sequence-based computational methods are highly desired. In this paper, we report an online prediction tool called Mem Brain, whose input is the amino acid sequence. Mem Brain consists of specialized modules for predicting transmembrane helices, residue–residue contacts and relative accessible surface area of a-helical membrane proteins. Mem Brain achieves aprediction accuracy of 97.9% of ATMH, 87.1% of AP,3.2 ± 3.0 of N-score, 3.1 ± 2.8 of C-score. Mem BrainContact obtains 62%/64.1% prediction accuracy on training and independent dataset on top L/5 contact prediction,respectively. And Mem Brain-Rasa achieves Pearson correlation coefficient of 0.733 and its mean absolute error of13.593. These prediction results provide valuable hints for revealing the structure and function of membrane proteins.Mem Brain web server is free for academic use and available at www.csbio.sjtu.edu.cn/bioinf/Mem Brain/.展开更多
Recently, deep neural networks(DNNs) significantly outperform Gaussian mixture models in acoustic modeling for speech recognition. However, the substantial increase in computational load during the inference stage mak...Recently, deep neural networks(DNNs) significantly outperform Gaussian mixture models in acoustic modeling for speech recognition. However, the substantial increase in computational load during the inference stage makes deep models difficult to directly deploy on low-power embedded devices. To alleviate this issue,structure sparseness and low precision fixed-point quantization have been applied widely. In this work, binary neural networks for speech recognition are developed to reduce the computational cost during the inference stage.A fast implementation of binary matrix multiplication is introduced. On modern central processing unit(CPU)and graphics processing unit(GPU) architectures, a 5–7 times speedup compared with full precision floatingpoint matrix multiplication can be achieved in real applications. Several kinds of binary neural networks and related model optimization algorithms are developed for large vocabulary continuous speech recognition acoustic modeling. In addition, to improve the accuracy of binary models, knowledge distillation from the normal full precision floating-point model to the compressed binary model is explored. Experiments on the standard Switchboard speech recognition task show that the proposed binary neural networks can deliver 3–4 times speedup over the normal full precision deep models. With the knowledge distillation from the normal floating-point models, the binary DNNs or binary convolutional neural networks(CNNs) can restrict the word error rate(WER) degradation to within 15.0%,compared to the normal full precision floating-point DNNs or CNNs, respectively. Particularly for the binary CNN with binarization only on the convolutional layers, the WER degradation is very small and is almost negligible with the proposed approach.展开更多
Background:The type Ⅲ secreted effectors(T3SEs)are one of the indispensable proteins in the growth and reproduction of Gram-negative bacteria.In particular,the pathogenesis of Gram-negative bacteria depends on the ty...Background:The type Ⅲ secreted effectors(T3SEs)are one of the indispensable proteins in the growth and reproduction of Gram-negative bacteria.In particular,the pathogenesis of Gram-negative bacteria depends on the type Ⅲ secreted effectors,and by injecting T3SEs into a host cell,the host cell's immunity can be destroyed.The high diversity of T3SE sequences and the lack of defined secretion signals make it difficult to identify and predict.Moreover,the related study of the pathological system associated with T3SE remains a hot topic in bioinformatics.Some computational tools have been developed to meet the growing demand for the recognition of T3SEs and the studies of type Ⅲ secretion systems(T3SS).Although these tools can help biological experiments in certain procedures,there is still room for improvement,even for the current best model,as the existing methods adopt handdesigned feature and traditional machine learning methods.Methods:In this study,we propose a powerful predictor based on deep learning methods,called WEDeepT3.Our work consists mainly of three key steps.First,we train word embedding vectors for protein sequences in a large-scale amino acid sequence database.Second,we combine the word vectors with traditional features extracted from protein sequences,like PSSM,to construct a more comprehensive feature representation.Finally,we construct a deep neural network model in the prediction of type Ⅲ secreted effectors.Results:The feature representation of WEDeepT3 consists of both word embedding and position-specific features.Working together with convolutional neural networks,the new model achieves superior performance to the state-ofthe-art methods,demonstrating the effectiveness of the new feature representation and the powerful learning ability of deep models.Conclusion:WEDeepT3 exploits both semantic information of Ar-mer fragments and evolutional information of protein sequences to accurately difYerentiate between T3SEs and non-T3SEs.WEDeepT3 is available at bcmi.sjtu.edu.cn/~yangyang/WEDeepT3.html.展开更多
基金supported by the National Natural Science Foundation of China(Nos.61671288,91530321,61603161)Science and Technology Commission of Shanghai Municipality(Nos.16JC1404300,17JC1403500,16ZR1448700)
文摘Membrane proteins are an important kind of proteins embedded in the membranes of cells and play crucial roles in living organisms, such as ion channels,transporters, receptors. Because it is difficult to determinate the membrane protein's structure by wet-lab experiments,accurate and fast amino acid sequence-based computational methods are highly desired. In this paper, we report an online prediction tool called Mem Brain, whose input is the amino acid sequence. Mem Brain consists of specialized modules for predicting transmembrane helices, residue–residue contacts and relative accessible surface area of a-helical membrane proteins. Mem Brain achieves aprediction accuracy of 97.9% of ATMH, 87.1% of AP,3.2 ± 3.0 of N-score, 3.1 ± 2.8 of C-score. Mem BrainContact obtains 62%/64.1% prediction accuracy on training and independent dataset on top L/5 contact prediction,respectively. And Mem Brain-Rasa achieves Pearson correlation coefficient of 0.733 and its mean absolute error of13.593. These prediction results provide valuable hints for revealing the structure and function of membrane proteins.Mem Brain web server is free for academic use and available at www.csbio.sjtu.edu.cn/bioinf/Mem Brain/.
基金the National Natural Science Foundation of China (Nos. 61603252 and U1736202)experiments have been carried out on the Pi supercomputer at Shanghai Jiao Tong University.
文摘Recently, deep neural networks(DNNs) significantly outperform Gaussian mixture models in acoustic modeling for speech recognition. However, the substantial increase in computational load during the inference stage makes deep models difficult to directly deploy on low-power embedded devices. To alleviate this issue,structure sparseness and low precision fixed-point quantization have been applied widely. In this work, binary neural networks for speech recognition are developed to reduce the computational cost during the inference stage.A fast implementation of binary matrix multiplication is introduced. On modern central processing unit(CPU)and graphics processing unit(GPU) architectures, a 5–7 times speedup compared with full precision floatingpoint matrix multiplication can be achieved in real applications. Several kinds of binary neural networks and related model optimization algorithms are developed for large vocabulary continuous speech recognition acoustic modeling. In addition, to improve the accuracy of binary models, knowledge distillation from the normal full precision floating-point model to the compressed binary model is explored. Experiments on the standard Switchboard speech recognition task show that the proposed binary neural networks can deliver 3–4 times speedup over the normal full precision deep models. With the knowledge distillation from the normal floating-point models, the binary DNNs or binary convolutional neural networks(CNNs) can restrict the word error rate(WER) degradation to within 15.0%,compared to the normal full precision floating-point DNNs or CNNs, respectively. Particularly for the binary CNN with binarization only on the convolutional layers, the WER degradation is very small and is almost negligible with the proposed approach.
基金supported by the National Natural Science Foundation of China(No.61972251).
文摘Background:The type Ⅲ secreted effectors(T3SEs)are one of the indispensable proteins in the growth and reproduction of Gram-negative bacteria.In particular,the pathogenesis of Gram-negative bacteria depends on the type Ⅲ secreted effectors,and by injecting T3SEs into a host cell,the host cell's immunity can be destroyed.The high diversity of T3SE sequences and the lack of defined secretion signals make it difficult to identify and predict.Moreover,the related study of the pathological system associated with T3SE remains a hot topic in bioinformatics.Some computational tools have been developed to meet the growing demand for the recognition of T3SEs and the studies of type Ⅲ secretion systems(T3SS).Although these tools can help biological experiments in certain procedures,there is still room for improvement,even for the current best model,as the existing methods adopt handdesigned feature and traditional machine learning methods.Methods:In this study,we propose a powerful predictor based on deep learning methods,called WEDeepT3.Our work consists mainly of three key steps.First,we train word embedding vectors for protein sequences in a large-scale amino acid sequence database.Second,we combine the word vectors with traditional features extracted from protein sequences,like PSSM,to construct a more comprehensive feature representation.Finally,we construct a deep neural network model in the prediction of type Ⅲ secreted effectors.Results:The feature representation of WEDeepT3 consists of both word embedding and position-specific features.Working together with convolutional neural networks,the new model achieves superior performance to the state-ofthe-art methods,demonstrating the effectiveness of the new feature representation and the powerful learning ability of deep models.Conclusion:WEDeepT3 exploits both semantic information of Ar-mer fragments and evolutional information of protein sequences to accurately difYerentiate between T3SEs and non-T3SEs.WEDeepT3 is available at bcmi.sjtu.edu.cn/~yangyang/WEDeepT3.html.