Complex proteins are needed for many biological activities.Folding amino acid chains reveals their properties and functions.They support healthy tissue structure,physiology,and homeostasis.Precision medicine and treat...Complex proteins are needed for many biological activities.Folding amino acid chains reveals their properties and functions.They support healthy tissue structure,physiology,and homeostasis.Precision medicine and treatments require quantitative protein identification and function.Despite technical advances and protein sequence data exploration,bioinformatics’“basic structure”problem—the automatic deduction of a protein’s properties from its amino acid sequence—remains unsolved.Protein function inference from amino acid sequences is the main biological data challenge.This study analyzes whether raw sequencing can characterize biological facts.A massive corpus of protein sequences and the Globin-like superfamily’s related protein families generate a solid vector representation.A coding technique for each sequence in each family was devised using two representations to identify each amino acid precisely.A bispectral analysis converts encoded protein numerical sequences into images for better protein sequence and family discrimination.Training and validation employed 70%of the dataset,while 30%was used for testing.This paper examined the performance of multistage deep learning models for differentiating between sixteen protein families after encoding and representing each encoded sequence by a higher spectral representation image(Bispectrum).Cascading minimized false positive and negative cases in all phases.The initial stage focused on two classes(six groups and ten groups).The subsequent stages focused on the few classes almost accurately separated in the first stage and decreased the overlapping cases between families that appeared in single-stage deep learning classification.The single-stage technique had 64.2%+/-22.8%accuracy,63.3%+/-17.1%precision,and a 63.2%+/19.4%F1-score.The two-stage technique yielded 92.2%+/-4.9%accuracy,92.7%+/-7.0%precision,and a 92.3%+/-5.0%F1-score.This work provides balanced,reliable,and precise forecasts for all families in all measures.It ensured that the new model was resilient to family variances and provided high-scoring results.展开更多
Proteins are essential for many biological functions.For example,folding amino acid chains reveals their functionalities by maintaining tissue structure,physiology,and homeostasis.Note that quantifiable protein charac...Proteins are essential for many biological functions.For example,folding amino acid chains reveals their functionalities by maintaining tissue structure,physiology,and homeostasis.Note that quantifiable protein characteristics are vital for improving therapies and precision medicine.The automatic inference of a protein’s properties from its amino acid sequence is called“basic structure”.Nevertheless,it remains a critical unsolved challenge in bioinformatics,although with recent technological advances and the investigation of protein sequence data.Inferring protein function from amino acid sequences is crucial in biology.This study considers using raw sequencing to explain biological facts using a large corpus of protein sequences and the Globin-like superfamily to generate a vector representation.The power of two representations was used to identify each amino acid,and a coding technique was established for each sequence family.Subsequently,the encoded protein numerical sequences are transformed into an image using bispectral analysis to identify essential characteristics for discriminating between protein sequences and their families.A deep Convolutional Neural Network(CNN)classifies the resulting images and developed non-normalized and normalized encoding techniques.Initially,the dataset was split 70/30 for training and testing.Correspondingly,the dataset was utilized for 70%training,15%validation,and 15%testing.The suggested methods are evaluated using accuracy,precision,and recall.The non-normalized method had 70%accuracy,72%precision,and 71%recall.68%accuracy,67%precision,and 67%recall after validation.Meanwhile,the normalized approach without validation had 92.4%accuracy,94.3%precision,and 91.1%recall.Validation showed 90%accuracy,91.2%precision,and 89.7%recall.Note that both algorithms outperform the rest.The paper presents that bispectrum-based nonlinear analysis using deep learning models outperforms standard machine learning methods and other deep learning methods based on convolutional architecture.They offered the best inference performance as the proposed approach improves categorization and prediction.Several instances show successful multi-class prediction in molecular biology’s massive data.展开更多
Liver cancer is the second leading cause of cancer death worldwide.Early tumor detection may help identify suitable treatment and increase the survival rate.Medical imaging is a non-invasive tool that can help uncover...Liver cancer is the second leading cause of cancer death worldwide.Early tumor detection may help identify suitable treatment and increase the survival rate.Medical imaging is a non-invasive tool that can help uncover abnormalities in human organs.Magnetic Resonance Imaging(MRI),in particular,uses magnetic fields and radio waves to differentiate internal human organs tissue.However,the interpretation of medical images requires the subjective expertise of a radiologist and oncologist.Thus,building an automated diagnosis computer-based system can help specialists reduce incorrect diagnoses.This paper proposes a hybrid automated system to compare the performance of 3D features and 2D features in classifying magnetic resonance liver tumor images.This paper proposed two models;the first one employed the 3D features while the second exploited the 2D features.The first system uses 3D texture attributes,3D shape features,and 3D graphical deep descriptors beside an ensemble classifier to differentiate between four 3D tumor categories.On top of that,the proposed method is applied to 2D slices for comparison purposes.The proposed approach attained 100%accuracy in discriminating between all types of tumors,100%Area Under the Curve(AUC),100%sensitivity,and 100%specificity and precision as well in 3D liver tumors.On the other hand,the performance is lower in 2D classification.The maximum accuracy reached 96.4%for two classes and 92.1%for four classes.The top-class performance of the proposed system can be attributed to the exploitation of various types of feature selection methods besides utilizing the ReliefF features selection technique to choose the most relevant features associated with different classes.The novelty of this work appeared in building a highly accurate system under specific circumstances without any processing for the images and human input,besides comparing the performance between 2D and 3D classification.In the future,the presented work can be extended to be used in the huge dataset.Then,it can be a reliable,efficient Computer Aided Diagnosis(CAD)system employed in hospitals in rural areas.展开更多
文摘Complex proteins are needed for many biological activities.Folding amino acid chains reveals their properties and functions.They support healthy tissue structure,physiology,and homeostasis.Precision medicine and treatments require quantitative protein identification and function.Despite technical advances and protein sequence data exploration,bioinformatics’“basic structure”problem—the automatic deduction of a protein’s properties from its amino acid sequence—remains unsolved.Protein function inference from amino acid sequences is the main biological data challenge.This study analyzes whether raw sequencing can characterize biological facts.A massive corpus of protein sequences and the Globin-like superfamily’s related protein families generate a solid vector representation.A coding technique for each sequence in each family was devised using two representations to identify each amino acid precisely.A bispectral analysis converts encoded protein numerical sequences into images for better protein sequence and family discrimination.Training and validation employed 70%of the dataset,while 30%was used for testing.This paper examined the performance of multistage deep learning models for differentiating between sixteen protein families after encoding and representing each encoded sequence by a higher spectral representation image(Bispectrum).Cascading minimized false positive and negative cases in all phases.The initial stage focused on two classes(six groups and ten groups).The subsequent stages focused on the few classes almost accurately separated in the first stage and decreased the overlapping cases between families that appeared in single-stage deep learning classification.The single-stage technique had 64.2%+/-22.8%accuracy,63.3%+/-17.1%precision,and a 63.2%+/19.4%F1-score.The two-stage technique yielded 92.2%+/-4.9%accuracy,92.7%+/-7.0%precision,and a 92.3%+/-5.0%F1-score.This work provides balanced,reliable,and precise forecasts for all families in all measures.It ensured that the new model was resilient to family variances and provided high-scoring results.
文摘Proteins are essential for many biological functions.For example,folding amino acid chains reveals their functionalities by maintaining tissue structure,physiology,and homeostasis.Note that quantifiable protein characteristics are vital for improving therapies and precision medicine.The automatic inference of a protein’s properties from its amino acid sequence is called“basic structure”.Nevertheless,it remains a critical unsolved challenge in bioinformatics,although with recent technological advances and the investigation of protein sequence data.Inferring protein function from amino acid sequences is crucial in biology.This study considers using raw sequencing to explain biological facts using a large corpus of protein sequences and the Globin-like superfamily to generate a vector representation.The power of two representations was used to identify each amino acid,and a coding technique was established for each sequence family.Subsequently,the encoded protein numerical sequences are transformed into an image using bispectral analysis to identify essential characteristics for discriminating between protein sequences and their families.A deep Convolutional Neural Network(CNN)classifies the resulting images and developed non-normalized and normalized encoding techniques.Initially,the dataset was split 70/30 for training and testing.Correspondingly,the dataset was utilized for 70%training,15%validation,and 15%testing.The suggested methods are evaluated using accuracy,precision,and recall.The non-normalized method had 70%accuracy,72%precision,and 71%recall.68%accuracy,67%precision,and 67%recall after validation.Meanwhile,the normalized approach without validation had 92.4%accuracy,94.3%precision,and 91.1%recall.Validation showed 90%accuracy,91.2%precision,and 89.7%recall.Note that both algorithms outperform the rest.The paper presents that bispectrum-based nonlinear analysis using deep learning models outperforms standard machine learning methods and other deep learning methods based on convolutional architecture.They offered the best inference performance as the proposed approach improves categorization and prediction.Several instances show successful multi-class prediction in molecular biology’s massive data.
文摘Liver cancer is the second leading cause of cancer death worldwide.Early tumor detection may help identify suitable treatment and increase the survival rate.Medical imaging is a non-invasive tool that can help uncover abnormalities in human organs.Magnetic Resonance Imaging(MRI),in particular,uses magnetic fields and radio waves to differentiate internal human organs tissue.However,the interpretation of medical images requires the subjective expertise of a radiologist and oncologist.Thus,building an automated diagnosis computer-based system can help specialists reduce incorrect diagnoses.This paper proposes a hybrid automated system to compare the performance of 3D features and 2D features in classifying magnetic resonance liver tumor images.This paper proposed two models;the first one employed the 3D features while the second exploited the 2D features.The first system uses 3D texture attributes,3D shape features,and 3D graphical deep descriptors beside an ensemble classifier to differentiate between four 3D tumor categories.On top of that,the proposed method is applied to 2D slices for comparison purposes.The proposed approach attained 100%accuracy in discriminating between all types of tumors,100%Area Under the Curve(AUC),100%sensitivity,and 100%specificity and precision as well in 3D liver tumors.On the other hand,the performance is lower in 2D classification.The maximum accuracy reached 96.4%for two classes and 92.1%for four classes.The top-class performance of the proposed system can be attributed to the exploitation of various types of feature selection methods besides utilizing the ReliefF features selection technique to choose the most relevant features associated with different classes.The novelty of this work appeared in building a highly accurate system under specific circumstances without any processing for the images and human input,besides comparing the performance between 2D and 3D classification.In the future,the presented work can be extended to be used in the huge dataset.Then,it can be a reliable,efficient Computer Aided Diagnosis(CAD)system employed in hospitals in rural areas.