Genome mining for the search and discovery of two new globin-like enzymes,TriB from Fusarium poae and TutaA from Schizophyllum commne,are involved in the synthesis of two linear terpenes tricinonoic acid(1)and 2-buten...Genome mining for the search and discovery of two new globin-like enzymes,TriB from Fusarium poae and TutaA from Schizophyllum commne,are involved in the synthesis of two linear terpenes tricinonoic acid(1)and 2-butenedioic acid(3).Both in vivo heterologous biosynthesis and in vitro biochemical assays showed that these two enzymes catalyzed the C-C double bond cleavage of a cyclic sesquiterpene precursor(-)-germacrene D(7)and a linear diterpene backbone schizostain(2),respectively.Our work presents an unusual formation mechanism of linear terpenes from fungi and expands the functional skills of globin-like enzymes in the synthesis of terpene compounds.展开更多
Complex proteins are needed for many biological activities.Folding amino acid chains reveals their properties and functions.They support healthy tissue structure,physiology,and homeostasis.Precision medicine and treat...Complex proteins are needed for many biological activities.Folding amino acid chains reveals their properties and functions.They support healthy tissue structure,physiology,and homeostasis.Precision medicine and treatments require quantitative protein identification and function.Despite technical advances and protein sequence data exploration,bioinformatics’“basic structure”problem—the automatic deduction of a protein’s properties from its amino acid sequence—remains unsolved.Protein function inference from amino acid sequences is the main biological data challenge.This study analyzes whether raw sequencing can characterize biological facts.A massive corpus of protein sequences and the Globin-like superfamily’s related protein families generate a solid vector representation.A coding technique for each sequence in each family was devised using two representations to identify each amino acid precisely.A bispectral analysis converts encoded protein numerical sequences into images for better protein sequence and family discrimination.Training and validation employed 70%of the dataset,while 30%was used for testing.This paper examined the performance of multistage deep learning models for differentiating between sixteen protein families after encoding and representing each encoded sequence by a higher spectral representation image(Bispectrum).Cascading minimized false positive and negative cases in all phases.The initial stage focused on two classes(six groups and ten groups).The subsequent stages focused on the few classes almost accurately separated in the first stage and decreased the overlapping cases between families that appeared in single-stage deep learning classification.The single-stage technique had 64.2%+/-22.8%accuracy,63.3%+/-17.1%precision,and a 63.2%+/19.4%F1-score.The two-stage technique yielded 92.2%+/-4.9%accuracy,92.7%+/-7.0%precision,and a 92.3%+/-5.0%F1-score.This work provides balanced,reliable,and precise forecasts for all families in all measures.It ensured that the new model was resilient to family variances and provided high-scoring results.展开更多
Proteins are essential for many biological functions.For example,folding amino acid chains reveals their functionalities by maintaining tissue structure,physiology,and homeostasis.Note that quantifiable protein charac...Proteins are essential for many biological functions.For example,folding amino acid chains reveals their functionalities by maintaining tissue structure,physiology,and homeostasis.Note that quantifiable protein characteristics are vital for improving therapies and precision medicine.The automatic inference of a protein’s properties from its amino acid sequence is called“basic structure”.Nevertheless,it remains a critical unsolved challenge in bioinformatics,although with recent technological advances and the investigation of protein sequence data.Inferring protein function from amino acid sequences is crucial in biology.This study considers using raw sequencing to explain biological facts using a large corpus of protein sequences and the Globin-like superfamily to generate a vector representation.The power of two representations was used to identify each amino acid,and a coding technique was established for each sequence family.Subsequently,the encoded protein numerical sequences are transformed into an image using bispectral analysis to identify essential characteristics for discriminating between protein sequences and their families.A deep Convolutional Neural Network(CNN)classifies the resulting images and developed non-normalized and normalized encoding techniques.Initially,the dataset was split 70/30 for training and testing.Correspondingly,the dataset was utilized for 70%training,15%validation,and 15%testing.The suggested methods are evaluated using accuracy,precision,and recall.The non-normalized method had 70%accuracy,72%precision,and 71%recall.68%accuracy,67%precision,and 67%recall after validation.Meanwhile,the normalized approach without validation had 92.4%accuracy,94.3%precision,and 91.1%recall.Validation showed 90%accuracy,91.2%precision,and 89.7%recall.Note that both algorithms outperform the rest.The paper presents that bispectrum-based nonlinear analysis using deep learning models outperforms standard machine learning methods and other deep learning methods based on convolutional architecture.They offered the best inference performance as the proposed approach improves categorization and prediction.Several instances show successful multi-class prediction in molecular biology’s massive data.展开更多
Krüppel样因子(Krüppel-like factors,KLFs)是一组与真核基因转录调控密切相关的锌指蛋白.KLFs高度保守的羧基末端含3个串联的Cys2His2型锌指结构,用于结合GC盒和CACCC盒等DNA序列.红细胞中特异表达的珠蛋白基因和许多红系调...Krüppel样因子(Krüppel-like factors,KLFs)是一组与真核基因转录调控密切相关的锌指蛋白.KLFs高度保守的羧基末端含3个串联的Cys2His2型锌指结构,用于结合GC盒和CACCC盒等DNA序列.红细胞中特异表达的珠蛋白基因和许多红系调控因子中都含有CACCC盒.已有研究发现,多个KLFs通过结合CACCC盒参与调控珠蛋白基因表达和红系分化,例如,KLF1通过结合β-珠蛋白启动子和位点控制区(locus control region,LCR),促进β-珠蛋白的表达、γ-向β-珠蛋白基因的转换和红系分化;KLF2、KLF11和KLF13分别促进ε-和γ-珠蛋白基因的表达;KLF4促进α-和γ-珠蛋白基因的表达;KLF3和KLF8则抑制ε-和γ-珠蛋白基因的表达.本文综述了KLFs调控珠蛋白基因表达和红系分化的研究进展.展开更多
HMG proteins are abundant chromosomal non-histone proteins. It has been suggested that the HMG proteins may play an important role in the structure and function of chromatin. In the present study, the binding of HMG p...HMG proteins are abundant chromosomal non-histone proteins. It has been suggested that the HMG proteins may play an important role in the structure and function of chromatin. In the present study, the binding of HMG proteins (HMG1/2 and HMG14/17) to the core DNA sequence of DNasel hypersensitive site 2 (HS2core DNA sequence, -10681-10970 bp) in the locus control region (LCR) of the human β-like globin gene cluster has been examined by using both the in vitro nucleosome reconstitution and the gel mobility shift assays. Here we show that HMG1/2 can bind to the naked HS2core DNA sequence, however, HMG 14/17 cannot. Using the in vitro nucleosome reconstitution we demonstrate that HMG14/17 can bind to the HS2core DNA sequence which is assembled into nucleosomes with the core histone octamer transferred from chicken erythrocytes. In contrast, HMG 1/2 cannot bind to the nucleosomes reconstituted in vitro with the HS2core DNA sequence. These results indicate that the binding patterns between HMG proteins and展开更多
基金supported by the National Natural Science Foundation of China(No.31870022)Chongqing Science Funds for Distinguished Young Scientists(No.cstc2020jcyjjqX0005)。
文摘Genome mining for the search and discovery of two new globin-like enzymes,TriB from Fusarium poae and TutaA from Schizophyllum commne,are involved in the synthesis of two linear terpenes tricinonoic acid(1)and 2-butenedioic acid(3).Both in vivo heterologous biosynthesis and in vitro biochemical assays showed that these two enzymes catalyzed the C-C double bond cleavage of a cyclic sesquiterpene precursor(-)-germacrene D(7)and a linear diterpene backbone schizostain(2),respectively.Our work presents an unusual formation mechanism of linear terpenes from fungi and expands the functional skills of globin-like enzymes in the synthesis of terpene compounds.
文摘Complex proteins are needed for many biological activities.Folding amino acid chains reveals their properties and functions.They support healthy tissue structure,physiology,and homeostasis.Precision medicine and treatments require quantitative protein identification and function.Despite technical advances and protein sequence data exploration,bioinformatics’“basic structure”problem—the automatic deduction of a protein’s properties from its amino acid sequence—remains unsolved.Protein function inference from amino acid sequences is the main biological data challenge.This study analyzes whether raw sequencing can characterize biological facts.A massive corpus of protein sequences and the Globin-like superfamily’s related protein families generate a solid vector representation.A coding technique for each sequence in each family was devised using two representations to identify each amino acid precisely.A bispectral analysis converts encoded protein numerical sequences into images for better protein sequence and family discrimination.Training and validation employed 70%of the dataset,while 30%was used for testing.This paper examined the performance of multistage deep learning models for differentiating between sixteen protein families after encoding and representing each encoded sequence by a higher spectral representation image(Bispectrum).Cascading minimized false positive and negative cases in all phases.The initial stage focused on two classes(six groups and ten groups).The subsequent stages focused on the few classes almost accurately separated in the first stage and decreased the overlapping cases between families that appeared in single-stage deep learning classification.The single-stage technique had 64.2%+/-22.8%accuracy,63.3%+/-17.1%precision,and a 63.2%+/19.4%F1-score.The two-stage technique yielded 92.2%+/-4.9%accuracy,92.7%+/-7.0%precision,and a 92.3%+/-5.0%F1-score.This work provides balanced,reliable,and precise forecasts for all families in all measures.It ensured that the new model was resilient to family variances and provided high-scoring results.
文摘Proteins are essential for many biological functions.For example,folding amino acid chains reveals their functionalities by maintaining tissue structure,physiology,and homeostasis.Note that quantifiable protein characteristics are vital for improving therapies and precision medicine.The automatic inference of a protein’s properties from its amino acid sequence is called“basic structure”.Nevertheless,it remains a critical unsolved challenge in bioinformatics,although with recent technological advances and the investigation of protein sequence data.Inferring protein function from amino acid sequences is crucial in biology.This study considers using raw sequencing to explain biological facts using a large corpus of protein sequences and the Globin-like superfamily to generate a vector representation.The power of two representations was used to identify each amino acid,and a coding technique was established for each sequence family.Subsequently,the encoded protein numerical sequences are transformed into an image using bispectral analysis to identify essential characteristics for discriminating between protein sequences and their families.A deep Convolutional Neural Network(CNN)classifies the resulting images and developed non-normalized and normalized encoding techniques.Initially,the dataset was split 70/30 for training and testing.Correspondingly,the dataset was utilized for 70%training,15%validation,and 15%testing.The suggested methods are evaluated using accuracy,precision,and recall.The non-normalized method had 70%accuracy,72%precision,and 71%recall.68%accuracy,67%precision,and 67%recall after validation.Meanwhile,the normalized approach without validation had 92.4%accuracy,94.3%precision,and 91.1%recall.Validation showed 90%accuracy,91.2%precision,and 89.7%recall.Note that both algorithms outperform the rest.The paper presents that bispectrum-based nonlinear analysis using deep learning models outperforms standard machine learning methods and other deep learning methods based on convolutional architecture.They offered the best inference performance as the proposed approach improves categorization and prediction.Several instances show successful multi-class prediction in molecular biology’s massive data.
文摘Krüppel样因子(Krüppel-like factors,KLFs)是一组与真核基因转录调控密切相关的锌指蛋白.KLFs高度保守的羧基末端含3个串联的Cys2His2型锌指结构,用于结合GC盒和CACCC盒等DNA序列.红细胞中特异表达的珠蛋白基因和许多红系调控因子中都含有CACCC盒.已有研究发现,多个KLFs通过结合CACCC盒参与调控珠蛋白基因表达和红系分化,例如,KLF1通过结合β-珠蛋白启动子和位点控制区(locus control region,LCR),促进β-珠蛋白的表达、γ-向β-珠蛋白基因的转换和红系分化;KLF2、KLF11和KLF13分别促进ε-和γ-珠蛋白基因的表达;KLF4促进α-和γ-珠蛋白基因的表达;KLF3和KLF8则抑制ε-和γ-珠蛋白基因的表达.本文综述了KLFs调控珠蛋白基因表达和红系分化的研究进展.
文摘HMG proteins are abundant chromosomal non-histone proteins. It has been suggested that the HMG proteins may play an important role in the structure and function of chromatin. In the present study, the binding of HMG proteins (HMG1/2 and HMG14/17) to the core DNA sequence of DNasel hypersensitive site 2 (HS2core DNA sequence, -10681-10970 bp) in the locus control region (LCR) of the human β-like globin gene cluster has been examined by using both the in vitro nucleosome reconstitution and the gel mobility shift assays. Here we show that HMG1/2 can bind to the naked HS2core DNA sequence, however, HMG 14/17 cannot. Using the in vitro nucleosome reconstitution we demonstrate that HMG14/17 can bind to the HS2core DNA sequence which is assembled into nucleosomes with the core histone octamer transferred from chicken erythrocytes. In contrast, HMG 1/2 cannot bind to the nucleosomes reconstituted in vitro with the HS2core DNA sequence. These results indicate that the binding patterns between HMG proteins and