Proteins are essential for many biological functions.For example,folding amino acid chains reveals their functionalities by maintaining tissue structure,physiology,and homeostasis.Note that quantifiable protein charac...Proteins are essential for many biological functions.For example,folding amino acid chains reveals their functionalities by maintaining tissue structure,physiology,and homeostasis.Note that quantifiable protein characteristics are vital for improving therapies and precision medicine.The automatic inference of a protein’s properties from its amino acid sequence is called“basic structure”.Nevertheless,it remains a critical unsolved challenge in bioinformatics,although with recent technological advances and the investigation of protein sequence data.Inferring protein function from amino acid sequences is crucial in biology.This study considers using raw sequencing to explain biological facts using a large corpus of protein sequences and the Globin-like superfamily to generate a vector representation.The power of two representations was used to identify each amino acid,and a coding technique was established for each sequence family.Subsequently,the encoded protein numerical sequences are transformed into an image using bispectral analysis to identify essential characteristics for discriminating between protein sequences and their families.A deep Convolutional Neural Network(CNN)classifies the resulting images and developed non-normalized and normalized encoding techniques.Initially,the dataset was split 70/30 for training and testing.Correspondingly,the dataset was utilized for 70%training,15%validation,and 15%testing.The suggested methods are evaluated using accuracy,precision,and recall.The non-normalized method had 70%accuracy,72%precision,and 71%recall.68%accuracy,67%precision,and 67%recall after validation.Meanwhile,the normalized approach without validation had 92.4%accuracy,94.3%precision,and 91.1%recall.Validation showed 90%accuracy,91.2%precision,and 89.7%recall.Note that both algorithms outperform the rest.The paper presents that bispectrum-based nonlinear analysis using deep learning models outperforms standard machine learning methods and other deep learning methods based on convolutional architecture.They offered the best inference performance as the proposed approach improves categorization and prediction.Several instances show successful multi-class prediction in molecular biology’s massive data.展开更多
The global growth of the Internet and the rapid expansion of social networks such as Facebook make multilingual sentiment analysis of social media content very necessary. This paper performs the first sentiment analys...The global growth of the Internet and the rapid expansion of social networks such as Facebook make multilingual sentiment analysis of social media content very necessary. This paper performs the first sentiment analysis on code-mixed Bambara-French Facebook comments. We develop four Long Short-term Memory(LSTM)-based models and two Convolutional Neural Network(CNN)-based models, and use these six models, Na?ve Bayes, and Support Vector Machines(SVM) to conduct experiments on a constituted dataset. Social media text written in Bambara is scarce. To mitigate this weakness, this paper uses dictionaries of character and word indexes to produce character and word embedding in place of pre-trained word vectors. We investigate the effect of comment length on the models and perform a comparison among them. The best performing model is a one-layer CNN deep learning model with an accuracy of 83.23 %.展开更多
文摘Proteins are essential for many biological functions.For example,folding amino acid chains reveals their functionalities by maintaining tissue structure,physiology,and homeostasis.Note that quantifiable protein characteristics are vital for improving therapies and precision medicine.The automatic inference of a protein’s properties from its amino acid sequence is called“basic structure”.Nevertheless,it remains a critical unsolved challenge in bioinformatics,although with recent technological advances and the investigation of protein sequence data.Inferring protein function from amino acid sequences is crucial in biology.This study considers using raw sequencing to explain biological facts using a large corpus of protein sequences and the Globin-like superfamily to generate a vector representation.The power of two representations was used to identify each amino acid,and a coding technique was established for each sequence family.Subsequently,the encoded protein numerical sequences are transformed into an image using bispectral analysis to identify essential characteristics for discriminating between protein sequences and their families.A deep Convolutional Neural Network(CNN)classifies the resulting images and developed non-normalized and normalized encoding techniques.Initially,the dataset was split 70/30 for training and testing.Correspondingly,the dataset was utilized for 70%training,15%validation,and 15%testing.The suggested methods are evaluated using accuracy,precision,and recall.The non-normalized method had 70%accuracy,72%precision,and 71%recall.68%accuracy,67%precision,and 67%recall after validation.Meanwhile,the normalized approach without validation had 92.4%accuracy,94.3%precision,and 91.1%recall.Validation showed 90%accuracy,91.2%precision,and 89.7%recall.Note that both algorithms outperform the rest.The paper presents that bispectrum-based nonlinear analysis using deep learning models outperforms standard machine learning methods and other deep learning methods based on convolutional architecture.They offered the best inference performance as the proposed approach improves categorization and prediction.Several instances show successful multi-class prediction in molecular biology’s massive data.
基金Supported by the National Natural Science Foundation of China(61272451,61572380,61772383 and 61702379)the Major State Basic Research Development Program of China(2014CB340600)
文摘The global growth of the Internet and the rapid expansion of social networks such as Facebook make multilingual sentiment analysis of social media content very necessary. This paper performs the first sentiment analysis on code-mixed Bambara-French Facebook comments. We develop four Long Short-term Memory(LSTM)-based models and two Convolutional Neural Network(CNN)-based models, and use these six models, Na?ve Bayes, and Support Vector Machines(SVM) to conduct experiments on a constituted dataset. Social media text written in Bambara is scarce. To mitigate this weakness, this paper uses dictionaries of character and word indexes to produce character and word embedding in place of pre-trained word vectors. We investigate the effect of comment length on the models and perform a comparison among them. The best performing model is a one-layer CNN deep learning model with an accuracy of 83.23 %.