The quality of the low frequency electromagnetic data is affected by the spike and the trend noises.Failure in removal of the spikes and the trends reduces the credibility of data explanation.Based on the analyses of ...The quality of the low frequency electromagnetic data is affected by the spike and the trend noises.Failure in removal of the spikes and the trends reduces the credibility of data explanation.Based on the analyses of the causes and characteristics of these noises,this paper presents the results of a preset statistics stacking method(PSSM)and a piecewise linear fitting method(PLFM)in de-noising the spikes and trends,respectively.The magnitudes of the spikes are either higher or lower than the normal values,which leads to distortion of the useful signal.Comparisons have been performed in removing of the spikes among the average,the statistics and the PSSM methods,and the results indicate that only the PSSM can remove the spikes successfully.On the other hand,the spectrums of the linear and nonlinear trends mainly lie in the low frequency band and can change the calculated resistivity significantly.No influence of the trends is observed when the frequency is higher than a certain threshold value.The PLSM can remove effectively both the linear and nonlinear trends with errors around 1% in the power spectrum.The proposed methods present an effective way for de-noising the spike and the trend noises in the low frequency electromagnetic data,and establish a research basis for de-noising the low frequency noises.展开更多
Background:The type Ⅲ secreted effectors(T3SEs)are one of the indispensable proteins in the growth and reproduction of Gram-negative bacteria.In particular,the pathogenesis of Gram-negative bacteria depends on the ty...Background:The type Ⅲ secreted effectors(T3SEs)are one of the indispensable proteins in the growth and reproduction of Gram-negative bacteria.In particular,the pathogenesis of Gram-negative bacteria depends on the type Ⅲ secreted effectors,and by injecting T3SEs into a host cell,the host cell's immunity can be destroyed.The high diversity of T3SE sequences and the lack of defined secretion signals make it difficult to identify and predict.Moreover,the related study of the pathological system associated with T3SE remains a hot topic in bioinformatics.Some computational tools have been developed to meet the growing demand for the recognition of T3SEs and the studies of type Ⅲ secretion systems(T3SS).Although these tools can help biological experiments in certain procedures,there is still room for improvement,even for the current best model,as the existing methods adopt handdesigned feature and traditional machine learning methods.Methods:In this study,we propose a powerful predictor based on deep learning methods,called WEDeepT3.Our work consists mainly of three key steps.First,we train word embedding vectors for protein sequences in a large-scale amino acid sequence database.Second,we combine the word vectors with traditional features extracted from protein sequences,like PSSM,to construct a more comprehensive feature representation.Finally,we construct a deep neural network model in the prediction of type Ⅲ secreted effectors.Results:The feature representation of WEDeepT3 consists of both word embedding and position-specific features.Working together with convolutional neural networks,the new model achieves superior performance to the state-ofthe-art methods,demonstrating the effectiveness of the new feature representation and the powerful learning ability of deep models.Conclusion:WEDeepT3 exploits both semantic information of Ar-mer fragments and evolutional information of protein sequences to accurately difYerentiate between T3SEs and non-T3SEs.WEDeepT3 is available at bcmi.sjtu.edu.cn/~yangyang/WEDeepT3.html.展开更多
文摘The quality of the low frequency electromagnetic data is affected by the spike and the trend noises.Failure in removal of the spikes and the trends reduces the credibility of data explanation.Based on the analyses of the causes and characteristics of these noises,this paper presents the results of a preset statistics stacking method(PSSM)and a piecewise linear fitting method(PLFM)in de-noising the spikes and trends,respectively.The magnitudes of the spikes are either higher or lower than the normal values,which leads to distortion of the useful signal.Comparisons have been performed in removing of the spikes among the average,the statistics and the PSSM methods,and the results indicate that only the PSSM can remove the spikes successfully.On the other hand,the spectrums of the linear and nonlinear trends mainly lie in the low frequency band and can change the calculated resistivity significantly.No influence of the trends is observed when the frequency is higher than a certain threshold value.The PLSM can remove effectively both the linear and nonlinear trends with errors around 1% in the power spectrum.The proposed methods present an effective way for de-noising the spike and the trend noises in the low frequency electromagnetic data,and establish a research basis for de-noising the low frequency noises.
基金supported by the National Natural Science Foundation of China(No.61972251).
文摘Background:The type Ⅲ secreted effectors(T3SEs)are one of the indispensable proteins in the growth and reproduction of Gram-negative bacteria.In particular,the pathogenesis of Gram-negative bacteria depends on the type Ⅲ secreted effectors,and by injecting T3SEs into a host cell,the host cell's immunity can be destroyed.The high diversity of T3SE sequences and the lack of defined secretion signals make it difficult to identify and predict.Moreover,the related study of the pathological system associated with T3SE remains a hot topic in bioinformatics.Some computational tools have been developed to meet the growing demand for the recognition of T3SEs and the studies of type Ⅲ secretion systems(T3SS).Although these tools can help biological experiments in certain procedures,there is still room for improvement,even for the current best model,as the existing methods adopt handdesigned feature and traditional machine learning methods.Methods:In this study,we propose a powerful predictor based on deep learning methods,called WEDeepT3.Our work consists mainly of three key steps.First,we train word embedding vectors for protein sequences in a large-scale amino acid sequence database.Second,we combine the word vectors with traditional features extracted from protein sequences,like PSSM,to construct a more comprehensive feature representation.Finally,we construct a deep neural network model in the prediction of type Ⅲ secreted effectors.Results:The feature representation of WEDeepT3 consists of both word embedding and position-specific features.Working together with convolutional neural networks,the new model achieves superior performance to the state-ofthe-art methods,demonstrating the effectiveness of the new feature representation and the powerful learning ability of deep models.Conclusion:WEDeepT3 exploits both semantic information of Ar-mer fragments and evolutional information of protein sequences to accurately difYerentiate between T3SEs and non-T3SEs.WEDeepT3 is available at bcmi.sjtu.edu.cn/~yangyang/WEDeepT3.html.