Background Pork quality can directly affect customer purchase tendency and meat quality traits have become valu-able in modern pork production.However,genetic improvement has been slow due to high phenotyping costs.In...Background Pork quality can directly affect customer purchase tendency and meat quality traits have become valu-able in modern pork production.However,genetic improvement has been slow due to high phenotyping costs.In this study,whole genome sequence(WGS)data was used to evaluate the prediction accuracy of genomic best linear unbiased prediction(GBLUP)for meat quality in large-scale crossbred commercial pigs.Results We produced WGS data(18,695,907 SNPs and 2,106,902 INDELs exceed quality control)from 1,469 sequenced Duroc×(Landrace×Yorkshire)pigs and developed a reference panel for meat quality including meat color score,marbling score,L*(lightness),a*(redness),and b*(yellowness)of genomic prediction.The prediction accuracy was defined as the Pearson correlation coefficient between adjusted phenotypes and genomic estimated breeding values in the validation population.Using different marker density panels derived from WGS data,accuracy differed substantially among meat quality traits,varied from 0.08 to 0.47.Results showed that MultiBLUP outperform GBLUP and yielded accuracy increases ranging from 17.39%to 75%.We optimized the marker density and found medium-and high-density marker panels are beneficial for the estimation of heritability for meat quality.Moreover,we conducted genotype imputation from 50K chip to WGS level in the same population and found average concord-ance rate to exceed 95%and r^(2)=0.81.Conclusions Overall,estimation of heritability for meat quality traits can benefit from the use of WGS data.This study showed the superiority of using WGS data to genetically improve pork quality in genomic prediction.展开更多
Background Breed identification is useful in a variety of biological contexts.Breed identification usually involves two stages,i.e.,detection of breed-informative SNPs and breed assignment.For both stages,there are se...Background Breed identification is useful in a variety of biological contexts.Breed identification usually involves two stages,i.e.,detection of breed-informative SNPs and breed assignment.For both stages,there are several methods proposed.However,what is the optimal combination of these methods remain unclear.In this study,using the whole genome sequence data available for 13 cattle breeds from Run 8 of the 1,000 Bull Genomes Project,we compared the combinations of three methods(Delta,FST,and In)for breed-informative SNP detection and five machine learning methods(KNN,SVM,RF,NB,and ANN)for breed assignment with respect to different reference population sizes and difference numbers of most breed-informative SNPs.In addition,we evaluated the accuracy of breed identification using SNP chip data of different densities.Results We found that all combinations performed quite well with identification accuracies over 95%in all scenarios.However,there was no combination which performed the best and robust across all scenarios.We proposed to inte-grate the three breed-informative detection methods,named DFI,and integrate the three machine learning methods,KNN,SVM,and RF,named KSR.We found that the combination of these two integrated methods outperformed the other combinations with accuracies over 99%in most cases and was very robust in all scenarios.The accuracies from using SNP chip data were only slightly lower than that from using sequence data in most cases.Conclusions The current study showed that the combination of DFI and KSR was the optimal strategy.Using sequence data resulted in higher accuracies than using chip data in most cases.However,the differences were gener-ally small.In view of the cost of genotyping,using chip data is also a good option for breed identification.展开更多
Coding sequences (CDS) are commonly used for transient gene expression, in yeast two-hybrid screening, to verify protein interactions and in prokaryotic gene expression studies. CDS are most commonly obtained using co...Coding sequences (CDS) are commonly used for transient gene expression, in yeast two-hybrid screening, to verify protein interactions and in prokaryotic gene expression studies. CDS are most commonly obtained using complementary DNA (cDNA) derived from messenger RNA (mRNA) extracted from plant tissues and generated by reverse transcription. However, some CDS are difficult to acquire through this process as they are expressed at extremely low levels or have specific spatial and/or temporal expression patterns in vivo. These challenges require the development of alternative CDS cloning technologies. In this study, we found that the genomic intron-containing gene coding sequences (gDNA) from Arabidopsis thaliana, Oryza sativa, Brassica napus, and Glycine max can be correctly transcribed and spliced into mRNA in Nicotiana benthamiana. In contrast, gDNAs from Triticum aestivum and Sorghum bicolor did not function correctly. In transient expression experiments, the target DNA sequence is driven by a constitutive promoter. Theoretically, a sufficient amount of mRNA can be extracted from the N. benthamiana leaves, making it conducive to the cloning of CDS target genes. Our data demonstrate that N. benthamiana can be used as an effective host for the cloning CDS of plant genes.展开更多
Introduction: Omicron is a highly divergent variant of concern (VOCs) of a severe acute respiratory syndrome SARS-CoV-2. It carries a high number of mutations in its spike protein hence;it is more transmissible in the...Introduction: Omicron is a highly divergent variant of concern (VOCs) of a severe acute respiratory syndrome SARS-CoV-2. It carries a high number of mutations in its spike protein hence;it is more transmissible in the community by immune evasion mechanisms. Due to mutation within S gene, most Omicron variants have reported S gene target failure (SGTF) with some commercially available PCR kits. Such diagnostic features can be used as markers to screen Omicron. However, Whole Genome Sequencing (WGS) is the only gold standard approach to confirm novel microorganisms at genetically level as similar mutations can also be found in other variants that are circulating at low frequencies worldwide. This Retrospective study is aimed to assess RT-PCR sensitivity in the detection of S gene target failure in comparison with whole genome sequencing to detect variants of Omicron. Methods: We have analysed retrospective data of SARS-CoV-2 positive RT-PCR samples for S gene target failure (SGTF) with TaqPath COVID-19 RT-PCR Combo Kit (ThermoFisher) and combined with sequencing technologies to study the emerged pattern of SARS-CoV-2 variants during third wave at the tertiary care centre, Surat. Results: From the first day of December 2021 till the end of February 2022, a total of 321,803 diagnostic RT-PCR tests for SARS-CoV-2 were performed, of which 20,566 positive cases were reported at our tertiary care centre with an average cumulative positivity of 6.39% over a period of three months. In the month of December 21 samples characterized by the SGTF (70/129) were suggestive of being infected by the Omicron variant and identified as Omicron (B.1.1.529 lineage) when sequence. In the month of January, we analysed a subset of samples (n = 618) with SGTF (24%) and without SGTF (76%) with Ct values Conclusions: During the COVID-19 pandemic, it took almost more than 15 days to diagnose infection and identify pathogen by sequencing technology. In contrast to that molecular assay provided quick identification with the help of SGTF phenomenon within 5 hours of duration. This strategy helps scientists and health policymakers for the quick isolation and identification of clusters. That ultimately results in a decreased transmission of pathogen among the community.展开更多
In this study, we determined the complete nucleotide and deduced amino acid sequence of a primary isolate of rabies virus (SH06) obtained from the brain of a rabid dog. The overall length of the genome was 11 924 nucl...In this study, we determined the complete nucleotide and deduced amino acid sequence of a primary isolate of rabies virus (SH06) obtained from the brain of a rabid dog. The overall length of the genome was 11 924 nucleotides. Comparison of the genomic sequence showed the homology of SH06 at nucleotide level with full-length genomes of reference vaccine strains ranged from 82.2% with the PV strain to 86.9% with the CTN strain. A full-length genome-based phylogenetic analysis was performed with sequences available from GenBank. Phylogenetic analysis of the complete genome sequences indicated that the SH06 exhibited the highest homology with rabies street virus BD06 and CTN vaccine strain originated from China.展开更多
Next-generation sequencing(NGS) technologies have made high-throughput sequencing available to medium- and small-size laboratories, culminating in a tidal wave of genomic information. The quantity of sequenced bacteri...Next-generation sequencing(NGS) technologies have made high-throughput sequencing available to medium- and small-size laboratories, culminating in a tidal wave of genomic information. The quantity of sequenced bacterial genomes has not only brought excitement to the field of genomics but also heightened expectations that NGS would boost antibacterial discovery and vaccine development. Although many possible drug and vaccine targets have been discovered, the success rate of genome-based analysis has remained below expectations. Furthermore, NGS has had consequences for genome quality, resulting in an exponential increase in draft(partial data) genome deposits in public databases. If no further interests are expressed for a particular bacterial genome, it is more likely that the sequencing of its genome will be limited to a draft stage, and the painstaking tasks of completing the sequencing of its genome and annotation will not be undertaken. It is important to know what is lost when we settle for a draft genome and to determine the "scientific value" of a newly sequenced genome. This review addresses the expected impact of newly sequenced genomes on antibacterial discovery and vaccinology. Also, it discusses the factors that could be leading to the increase in the number of draft deposits and the consequent loss of relevant biological information.展开更多
Citrus leaf blotch virus (CLBV) is a member of the genus Citrivirus, in the family Betaflexiviridae. It has been reported CLBV could infect kiwi, citrus and sweet cherry in China. Of 289 citrus samples from six regi...Citrus leaf blotch virus (CLBV) is a member of the genus Citrivirus, in the family Betaflexiviridae. It has been reported CLBV could infect kiwi, citrus and sweet cherry in China. Of 289 citrus samples from six regions of China, 15 were detected to be infected with CLBV in this study. The complete genome of four isolates of CLBV was obtained from Reikou in Sichuan (CLBV-LH), Yura Wase in Zhejiang (CLBV-YL), Bingtangcheng in Hunan (CLBV-BT), Fengjie 72-1 in Chongqing (CLBV- F J), respectively. While they all represented 8 747 nucleotides in monopartite size, excluding the poly(A) tail, each of the isolates coded three open reading frames (ORFs). Identity of the four isolates ranged from 98.9 to 99.8% to each other and from 96.8 to 98.1% to the citrus references in GenBank by multiple alignment of genomes. A phylogenetic tree based on the genome sequences of available CLBV isolates indicated that the four isolates were clustered together, suggesting that CLBV isolates from citrus in China did not have obvious variation. This is the first report of the complete nucleotide sequences of CLBV isolates infecting citrus in China.展开更多
The complete nucleotide sequence of an isolate of Citrus vein enation virus(CVEV-XZG) from China has been determined for the first time. The genome consisted of 5 983 nucleotides, coding for five open reading frames...The complete nucleotide sequence of an isolate of Citrus vein enation virus(CVEV-XZG) from China has been determined for the first time. The genome consisted of 5 983 nucleotides, coding for five open reading frames(ORFs), had a similar genomic organization features with Pea enation mosaic virus(PEMV). Nucleotide and deduced amino acid sequence identity of the five ORFs compared to isolate CVEV VE-1 range from 97.1 to 99.0% and 97.4 to 100.0%, these values compared to isolate PEMV-1 range from 45.2 to 51.6% and 31.1 to 45.2%. Phylogenetic analysis based on the complete genome sequence showed that the isolate CVEV-XZG had close relationship with Pea enation mosaic virus. The results supports CVEV may be a new member of genus Enamovirus. The full sequence of CVEV-XZG presented here may serve as a basis for future study of CVEV in China.展开更多
Objective Knowledge of an enterovirus genome sequence is very important in epidemiological investigation to identify transmission patterns and ascertain the extent of an outbreak. The MinION sequencer is increasingly ...Objective Knowledge of an enterovirus genome sequence is very important in epidemiological investigation to identify transmission patterns and ascertain the extent of an outbreak. The MinION sequencer is increasingly used to sequence various viral pathogens in many clinical situations because of its long reads, portability, real-time accessibility of sequenced data, and very low initial costs. However, information is lacking on MinION sequencing of enterovirus genomes. Methods In this proof-of-concept study using Enterovirus 71 (EV71) and Coxsackievirus A16 (CA16) strains as examples, we established an amplicon-based whole genome sequencing method using MinION. We explored the accuracy, minimum sequencing time, discrimination and high-throughput sequencing ability of MinION, and compared its performance with Sanger sequencing. Results Within the first minute (min) of sequencing, the accuracy of MinION was 98.5% for the single EV71 strain and 94.12%-97.33% for 10 genetically-related CA16 strains. In as little as 14 min, 99% identity was reached for the single EV71 strain, and in 17 min (on average), 99% identity was achieved for 10 CA16 strains in a single run. Conclusion MinION is suitable for whole genome sequencing of enteroviruses with sufficient accuracy and fine discrimination and has the potential as a fast, reliable and convenient method for routine use.展开更多
Viruses of thermophiles are of great interest due to their roles in gene transfer, global geochemical cycle and evolution of life on earth. However, the thermophilic bacteriophages have not been studied extensively. I...Viruses of thermophiles are of great interest due to their roles in gene transfer, global geochemical cycle and evolution of life on earth. However, the thermophilic bacteriophages have not been studied extensively. In this investigation, a typical bacteriophage BV1 was obtained from a thermophilic bacterium Geobacillus sp. 6k512, which was isolated from an inshore hot spring in Xiamen of China. The BV1 contained a double-stranded linear DNA of 35 055 bp, which encodes 54 open reading frames (ORFs). Interestingly, eight of the 54 BV1 ORFs shared sequence similarities to genes from human disease-relevant bacteria. Seven proteins of the purified BV1 virions were identified by proteomic analysis. Determination of BV1 functional genomics would facilitate the better understanding of the mechanism for virus-thermophile interaction.展开更多
Mycoplasma ovipneumoniae, a kind of mycoplasma bacteria, commonly infects the respiratory tract causing respiratory disease in sheep and goats worldwide. Here, the complete genome sequence of M. ovipneumoniae strain N...Mycoplasma ovipneumoniae, a kind of mycoplasma bacteria, commonly infects the respiratory tract causing respiratory disease in sheep and goats worldwide. Here, the complete genome sequence of M. ovipneumoniae strain NM2010 isolated from a sheep in China was reported for the ifrst time.展开更多
The nucleotide (base) sequence of the genome might reflect biological information beyond the coding sequences. The appearance frequencies of successive base sequences (key sequences) were calculated for entire genomes...The nucleotide (base) sequence of the genome might reflect biological information beyond the coding sequences. The appearance frequencies of successive base sequences (key sequences) were calculated for entire genomes. Based on the appearance frequency of the key sequences of the genome, any DNA sequences on the genome could be expressed as a sequence spectrum with the adjoining base sequences, which could be used to study the corresponding biological phenomena. In this paper, we used 64 successive three- base sequences (triplets) as the key sequences, and determined and compared the spectra of specific genes to the chromosome, or specific genes to tRNA genes in Saccharomyces cerevisiae, Schizosaccharomyces pombe and Escherichia coli. Based on these analyses, a gene and its corresponding position on the chromosome showed highly similar spectra with the same fold enlargement (approximately 400-fold) in the S. cerevisiae, S. pombe and E. coli genomes. In addition, the homologous structure of genes that encode proteins was also observed with appropriate tRNA gene(s) in the genome. This analytical method might faithfully reflect the encoded biological information, that is, the conservation of the base sequences was to make sense the conservation of the translated amino acids sequence in the coding region, and might be universally applicable to other genomes, even those that consisted of multiple chromosomes.展开更多
Bacillus amyloliquefaciens YP6,a plant growth promoting rhizobacteria,is capable of efficiently degrading a wide range of organophosphorus pesticides(OPs).Here,we report the complete genome sequence of this bacterium ...Bacillus amyloliquefaciens YP6,a plant growth promoting rhizobacteria,is capable of efficiently degrading a wide range of organophosphorus pesticides(OPs).Here,we report the complete genome sequence of this bacterium with a genome size of 4009619 bp,4210 protein-coding genes and an average GC content of 45.9%.Based on the genome sequence,several genes previously described as being involved in solubilizing-phosphorus,OPs-degradation,indole-3-acetic acid(IAA)and siderophores synthesis.Interestingly,compared with the genomes of B.amyloliquefaciens species,strain YP6 had larger genome size and the most protein-coding genes.Moreover,the four categories of“cell envelope biogenesis,outer membrane(M),”“translation,ribosomal structure and biogenesis(J),”“transcription(K),”and“signal transduction mechanisms(T)”were fewer.These differences may be related to extensive environmental adaptability of the genus B.amyloliquefaciens.These results expand the application potential of strain YP6 for environmental bioremediation,provide gene resources involved in OPs degradation for biotechnology and gene engineering,and contribute to provide insights into the relationship between microorganism and living environment.展开更多
Single-step genomic best linear unbiased prediction(ss GBLUP) is now intensively investigated and widely used in livestock breeding due to its beneficial feature of combining information from both genotyped and ungeno...Single-step genomic best linear unbiased prediction(ss GBLUP) is now intensively investigated and widely used in livestock breeding due to its beneficial feature of combining information from both genotyped and ungenotyped individuals in the single model. With the increasing accessibility of whole-genome sequence(WGS) data at the population level, more attention is being paid to the usage of WGS data in ss GBLUP. The predictive ability of ss GBLUP using WGS data might be improved by incorporating biological knowledge from public databases. Thus, we extended ss GBLUP, incorporated genomic annotation information into the model, and evaluated them using a yellow-feathered chicken population as the examples. The chicken population consisted of 1 338 birds with 23 traits, where imputed WGS data including 5 127 612 single nucleotide polymorphisms(SNPs) are available for 895 birds. Considering different combinations of annotation information and models, original ss GBLUP, haplotype-based ss GHBLUP, and four extended ss GBLUP incorporating genomic annotation models were evaluated. Based on the genomic annotation(GRCg6a) of chickens, 3 155 524 and 94 837 SNPs were mapped to genic and exonic regions, respectively. Extended ss GBLUP using genic/exonic SNPs outperformed other models with respect to predictive ability in 15 out of 23 traits, and their advantages ranged from 2.5 to 6.1% compared with original ss GBLUP. In addition, to further enhance the performance of genomic prediction with imputed WGS data, we investigated the genotyping strategies of reference population on ss GBLUP in the chicken population. Comparing two strategies of individual selection for genotyping in the reference population, the strategy of evenly selection by family(SBF) performed slightly better than random selection in most situations. Overall, we extended genomic prediction models that can comprehensively utilize WGS data and genomic annotation information in the framework of ss GBLUP, and validated the idea that properly handling the genomic annotation information and WGS data increased the predictive ability of ss GBLUP. Moreover, while using WGS data, the genotyping strategy of maximizing the expected genetic relationship between the reference and candidate population could further improve the predictive ability of ss GBLUP. The results from this study shed light on the comprehensive usage of genomic annotation information in WGS-based single-step genomic prediction.展开更多
Beach pea or beach cowpea(Vigna marina(Burm.)Merr.)belongs to the family Fabaceae.It is a close relative of cultivated Vigna species such as adzuki bean(V.angularis),cowpea(V.unguiculata),mung bean(V.radiata),and blac...Beach pea or beach cowpea(Vigna marina(Burm.)Merr.)belongs to the family Fabaceae.It is a close relative of cultivated Vigna species such as adzuki bean(V.angularis),cowpea(V.unguiculata),mung bean(V.radiata),and blackgram(V.mungo),and is distributed throughout the tropics.With its ability to tolerate salt stress,beach pea has great potential to contribute salt-tolerance genes for developing salt-tolerant cultivars in cultivated Vigna species.However,it is still underutilized in Vigna breeding programs.A draft genome sequence of beach pea was generated using a high-throughput next-generation sequencing platform,yielding 23.7 Gb of sequence from 79,929,868 filtered reads.A de novo genome assembly containing 68,731 scaffolds gave an N50 length of 10,272 bp and the assembled sequences totaled 365.6 Mb.A total of 35,448 SSRs,including 3574 compound SSRs,were identified and primer pairs for most of these SSRs were designed.Genome analysis identified 50,670 genes with mean coding sequence length 1042 bp.Phylogenetic analysis revealed highest sequence similarity with V.angularis,followed by V.radiata.Comparison with the V.angularis genome revealed 16,699 SNPs and 2253 InDels and comparison with the V.radiata genome revealed 17,538 SNPs and 2300 InDels.To our knowledge this is the first draft genome sequence of beach pea derived from an accession(ANBp-14-03)adapted locally in the Andaman and Nicobar Islands of India.The draft genome sequence may facilitate the genetic enhancement in cultivated Vigna species.展开更多
Molecular genetic maps were commonly constructed by analyzing the segregation of restriction fragment length polymorphisms (RFLPs). Here we described methodology-marker sequences in a new mapping based on recent docum...Molecular genetic maps were commonly constructed by analyzing the segregation of restriction fragment length polymorphisms (RFLPs). Here we described methodology-marker sequences in a new mapping based on recent documents. With the methods they were unique sequences detected by the polymerase chain reaction (PCR). Each of the methods had its Iimitations and the current trend was to integrate the maps produced by the different methods. Marker sequences contained mainly expressed sequence tags (ESTs),polymorphie sequence-tagged sites (STSs), randomly amplified polymorphic DNA (RAPDs), cIeaved amplified polymorphic sequences (CAPS), amplified fragment Iength pofymorphism (AFLPs), genorne sequence sampling (GSS) and sequence-tagged connectors (STCs) in this paper.展开更多
Simple sequence repeats (SSRs) or microsatellites, as genetic markers, are ubiquitous in genomes of various organisms. The analysis of SSR in rhizobia genome provides useful information for a variety of applications...Simple sequence repeats (SSRs) or microsatellites, as genetic markers, are ubiquitous in genomes of various organisms. The analysis of SSR in rhizobia genome provides useful information for a variety of applications in population genetics of rhizobia. We analyzed the occurrences, relative abundance, and relative density of SSRs, the most common in Bradyrhizobium japonicum, Mesorhizobium loti, and Sinorhizobium meliloti genomes se- quenced in the microorganisms tandem repeats database, and SSRs in the three species genomes were compared with each other. The result showed that there were 1 410, 859, and 638 SSRs in B. japonicum, M. loti, and S. meliloti genomes, respectively. In the genomes of B. japonicum, M. loti, and S. meliloti, tetranucleotide, pentanucleotide, and hexanucleotide repeats were more abundant and indicated higher mutation rates in these species. The least abundance was mononucleotide repeat. The SSRs type and distribution were similar among these species.展开更多
Genomics has become a ground-breaking field in all areas of the life sciences. The advanced genomics and the development of high-throughput techniques have lately provided insight into whole-genome characterization of...Genomics has become a ground-breaking field in all areas of the life sciences. The advanced genomics and the development of high-throughput techniques have lately provided insight into whole-genome characterization of a wide range of organisms. In the post-genomic era, new technologies have revealed an outbreak of prerequisite genomic sequences and supporting data to understand genome wide functional regulation of gene expression and metabolic pathways reconstruction. However, the availability of this plethora of genomic data presents a significant challenge for storage, analyses and data management. Analysis of this mega-data requires the development and application of novel bioinformatics tools that must include unified functional annotation, structural search, and comprehensive analysis and identification of new genes in a wide range of species with fully sequenced genomes. In addition, generation of systematically and syntactically unambiguous nomenclature systems for genomic data across species is a crucial task. Such systems are necessary for adequate handling genetic information in the context of comparative functional genomics. In this paper, we provide an overview of major advances in bioinformatics and computational biology in genome sequencing and next-generation sequence data analysis. We focus on their potential applications for efficient collection, storage, and analysis of genetic data/information from a wide range of gene banks. We also discuss the importance of establishing a unified nomenclature system through a functional and structural genomics approach.展开更多
Herein, we report a very high content of simple sequence repeats (SSRs) covering 66.12% of the herpes simplex virus type 1 (HSV-1) genome when a low threshold is adopted to define SSRs, indicating that repeat sequence...Herein, we report a very high content of simple sequence repeats (SSRs) covering 66.12% of the herpes simplex virus type 1 (HSV-1) genome when a low threshold is adopted to define SSRs, indicating that repeat sequence is a very important character of the HSV-1 genome. The repeats with two iterations account for 68.33% of the total repeats. In reality, the genome of HSV-1 is prone to form shorter repeat sequences. For mono-, di- and trinucleotide repeats, the repeat numbers decreased with the increase of repeats iterations, implicating that the formation tendency of SSRs might be from low iterations to high iterations. The high iterations SSRs might have subjected to strong selected pressure and survived to perform different functions. The analysis suggested that the repeats formation may be an essential evolutionary driving force for the HSV-1 genome, and the results might be helpful for studying the genome structure, repeats genesis and genome evolution of HSV-1.展开更多
The study of viruses and their genetics has been an opportunity as well as a challenge for the scientific community.The recent ongoing SARSCov2(Severe Acute Respiratory Syndrome)pandemic proved the unpreparedness for ...The study of viruses and their genetics has been an opportunity as well as a challenge for the scientific community.The recent ongoing SARSCov2(Severe Acute Respiratory Syndrome)pandemic proved the unpreparedness for these situations.Not only the countermeasures for the effect caused by virus need to be tackled but the mutation taking place in the very genome of the virus is needed to be kept in check frequently.One major way to find out more information about such pathogens is by extracting the genetic data of such viruses.Though genetic data of viruses have been cultured and stored as well as isolated in form of their genome sequences,there is still limited methods on what new viruses can be generated in future due to mutation.This research proposes a deep learning model to predict the genome sequences of the SARS-Cov2 virus using only the previous viruses of the coronaviridae family with the help of RNN-LSTM(Recurrent Neural Network-Long ShortTerm Memory)and RNN-GRU(Gated Recurrent Unit)so that in the future,several counter measures can be taken by predicting possible changes in the genome with the help of existing mutations in the virus.After the process of testing the model,the F1-recall came out to be more than 0.95.The mutation detection’s accuracy of both the models come out about 98.5%which shows the capability of the recurrent neural network to predict future changes in the genome of virus.展开更多
基金supported by a Technical Innovation of Crossbred in Swine and Breed High Fertility Lines Project(2022B0202090002)a Local Innovative and Research Teams Project of Guangdong Province(2019BT02N630)+1 种基金a Natural Science Foundation of Guangdong Province project(2018B030313011)Innovative Teams of Modern Agriculture and Industry Technology System of Guangdong Province(2022KJ26).
文摘Background Pork quality can directly affect customer purchase tendency and meat quality traits have become valu-able in modern pork production.However,genetic improvement has been slow due to high phenotyping costs.In this study,whole genome sequence(WGS)data was used to evaluate the prediction accuracy of genomic best linear unbiased prediction(GBLUP)for meat quality in large-scale crossbred commercial pigs.Results We produced WGS data(18,695,907 SNPs and 2,106,902 INDELs exceed quality control)from 1,469 sequenced Duroc×(Landrace×Yorkshire)pigs and developed a reference panel for meat quality including meat color score,marbling score,L*(lightness),a*(redness),and b*(yellowness)of genomic prediction.The prediction accuracy was defined as the Pearson correlation coefficient between adjusted phenotypes and genomic estimated breeding values in the validation population.Using different marker density panels derived from WGS data,accuracy differed substantially among meat quality traits,varied from 0.08 to 0.47.Results showed that MultiBLUP outperform GBLUP and yielded accuracy increases ranging from 17.39%to 75%.We optimized the marker density and found medium-and high-density marker panels are beneficial for the estimation of heritability for meat quality.Moreover,we conducted genotype imputation from 50K chip to WGS level in the same population and found average concord-ance rate to exceed 95%and r^(2)=0.81.Conclusions Overall,estimation of heritability for meat quality traits can benefit from the use of WGS data.This study showed the superiority of using WGS data to genetically improve pork quality in genomic prediction.
基金funded by National Key Research and Development Program of China(2021YFD1200404)the Yangzhou University Interdisciplinary Research Foundation for Animal Science Discipline of Targeted Support(yzuxk202016)the Project of Genetic Improvement for Agricultural Species(Dairy Cattle)of Shandong Province(2019LZGC011).
文摘Background Breed identification is useful in a variety of biological contexts.Breed identification usually involves two stages,i.e.,detection of breed-informative SNPs and breed assignment.For both stages,there are several methods proposed.However,what is the optimal combination of these methods remain unclear.In this study,using the whole genome sequence data available for 13 cattle breeds from Run 8 of the 1,000 Bull Genomes Project,we compared the combinations of three methods(Delta,FST,and In)for breed-informative SNP detection and five machine learning methods(KNN,SVM,RF,NB,and ANN)for breed assignment with respect to different reference population sizes and difference numbers of most breed-informative SNPs.In addition,we evaluated the accuracy of breed identification using SNP chip data of different densities.Results We found that all combinations performed quite well with identification accuracies over 95%in all scenarios.However,there was no combination which performed the best and robust across all scenarios.We proposed to inte-grate the three breed-informative detection methods,named DFI,and integrate the three machine learning methods,KNN,SVM,and RF,named KSR.We found that the combination of these two integrated methods outperformed the other combinations with accuracies over 99%in most cases and was very robust in all scenarios.The accuracies from using SNP chip data were only slightly lower than that from using sequence data in most cases.Conclusions The current study showed that the combination of DFI and KSR was the optimal strategy.Using sequence data resulted in higher accuracies than using chip data in most cases.However,the differences were gener-ally small.In view of the cost of genotyping,using chip data is also a good option for breed identification.
文摘Coding sequences (CDS) are commonly used for transient gene expression, in yeast two-hybrid screening, to verify protein interactions and in prokaryotic gene expression studies. CDS are most commonly obtained using complementary DNA (cDNA) derived from messenger RNA (mRNA) extracted from plant tissues and generated by reverse transcription. However, some CDS are difficult to acquire through this process as they are expressed at extremely low levels or have specific spatial and/or temporal expression patterns in vivo. These challenges require the development of alternative CDS cloning technologies. In this study, we found that the genomic intron-containing gene coding sequences (gDNA) from Arabidopsis thaliana, Oryza sativa, Brassica napus, and Glycine max can be correctly transcribed and spliced into mRNA in Nicotiana benthamiana. In contrast, gDNAs from Triticum aestivum and Sorghum bicolor did not function correctly. In transient expression experiments, the target DNA sequence is driven by a constitutive promoter. Theoretically, a sufficient amount of mRNA can be extracted from the N. benthamiana leaves, making it conducive to the cloning of CDS target genes. Our data demonstrate that N. benthamiana can be used as an effective host for the cloning CDS of plant genes.
文摘Introduction: Omicron is a highly divergent variant of concern (VOCs) of a severe acute respiratory syndrome SARS-CoV-2. It carries a high number of mutations in its spike protein hence;it is more transmissible in the community by immune evasion mechanisms. Due to mutation within S gene, most Omicron variants have reported S gene target failure (SGTF) with some commercially available PCR kits. Such diagnostic features can be used as markers to screen Omicron. However, Whole Genome Sequencing (WGS) is the only gold standard approach to confirm novel microorganisms at genetically level as similar mutations can also be found in other variants that are circulating at low frequencies worldwide. This Retrospective study is aimed to assess RT-PCR sensitivity in the detection of S gene target failure in comparison with whole genome sequencing to detect variants of Omicron. Methods: We have analysed retrospective data of SARS-CoV-2 positive RT-PCR samples for S gene target failure (SGTF) with TaqPath COVID-19 RT-PCR Combo Kit (ThermoFisher) and combined with sequencing technologies to study the emerged pattern of SARS-CoV-2 variants during third wave at the tertiary care centre, Surat. Results: From the first day of December 2021 till the end of February 2022, a total of 321,803 diagnostic RT-PCR tests for SARS-CoV-2 were performed, of which 20,566 positive cases were reported at our tertiary care centre with an average cumulative positivity of 6.39% over a period of three months. In the month of December 21 samples characterized by the SGTF (70/129) were suggestive of being infected by the Omicron variant and identified as Omicron (B.1.1.529 lineage) when sequence. In the month of January, we analysed a subset of samples (n = 618) with SGTF (24%) and without SGTF (76%) with Ct values Conclusions: During the COVID-19 pandemic, it took almost more than 15 days to diagnose infection and identify pathogen by sequencing technology. In contrast to that molecular assay provided quick identification with the help of SGTF phenomenon within 5 hours of duration. This strategy helps scientists and health policymakers for the quick isolation and identification of clusters. That ultimately results in a decreased transmission of pathogen among the community.
基金National High-Tech Research and Development Program of China (2007AA022402)
文摘In this study, we determined the complete nucleotide and deduced amino acid sequence of a primary isolate of rabies virus (SH06) obtained from the brain of a rabid dog. The overall length of the genome was 11 924 nucleotides. Comparison of the genomic sequence showed the homology of SH06 at nucleotide level with full-length genomes of reference vaccine strains ranged from 82.2% with the PV strain to 86.9% with the CTN strain. A full-length genome-based phylogenetic analysis was performed with sequences available from GenBank. Phylogenetic analysis of the complete genome sequences indicated that the SH06 exhibited the highest homology with rabies street virus BD06 and CTN vaccine strain originated from China.
基金Supported by Coordenao de Aperfeioamento de Pessoal de Nível Superior(CAPES)in Brazil,processes BEX 12954-12-8 and 11517-12-3,to Barbosa EGV and Aburjaile FF
文摘Next-generation sequencing(NGS) technologies have made high-throughput sequencing available to medium- and small-size laboratories, culminating in a tidal wave of genomic information. The quantity of sequenced bacterial genomes has not only brought excitement to the field of genomics but also heightened expectations that NGS would boost antibacterial discovery and vaccine development. Although many possible drug and vaccine targets have been discovered, the success rate of genome-based analysis has remained below expectations. Furthermore, NGS has had consequences for genome quality, resulting in an exponential increase in draft(partial data) genome deposits in public databases. If no further interests are expressed for a particular bacterial genome, it is more likely that the sequencing of its genome will be limited to a draft stage, and the painstaking tasks of completing the sequencing of its genome and annotation will not be undertaken. It is important to know what is lost when we settle for a draft genome and to determine the "scientific value" of a newly sequenced genome. This review addresses the expected impact of newly sequenced genomes on antibacterial discovery and vaccinology. Also, it discusses the factors that could be leading to the increase in the number of draft deposits and the consequent loss of relevant biological information.
基金supported by the National Natural Science Foundation of China (31501611)the Chongqing Research Program of Basic Research and Frontier Technology, China (cstc2017jcyjB X0016)+2 种基金the Chongqing Science and Technology Commission Project, China (cstc2016shmsztzx80003)the Fundamental Research Funds for the Central Universities, China (XDJK2016B21, SWU116012)the Special Fund for Agro-scientific Research in the Public Interest, China (201203076-01)
文摘Citrus leaf blotch virus (CLBV) is a member of the genus Citrivirus, in the family Betaflexiviridae. It has been reported CLBV could infect kiwi, citrus and sweet cherry in China. Of 289 citrus samples from six regions of China, 15 were detected to be infected with CLBV in this study. The complete genome of four isolates of CLBV was obtained from Reikou in Sichuan (CLBV-LH), Yura Wase in Zhejiang (CLBV-YL), Bingtangcheng in Hunan (CLBV-BT), Fengjie 72-1 in Chongqing (CLBV- F J), respectively. While they all represented 8 747 nucleotides in monopartite size, excluding the poly(A) tail, each of the isolates coded three open reading frames (ORFs). Identity of the four isolates ranged from 98.9 to 99.8% to each other and from 96.8 to 98.1% to the citrus references in GenBank by multiple alignment of genomes. A phylogenetic tree based on the genome sequences of available CLBV isolates indicated that the four isolates were clustered together, suggesting that CLBV isolates from citrus in China did not have obvious variation. This is the first report of the complete nucleotide sequences of CLBV isolates infecting citrus in China.
基金funded by the Chongqing Natural Science Foundation Project, China (cstc2011jj A80025)
文摘The complete nucleotide sequence of an isolate of Citrus vein enation virus(CVEV-XZG) from China has been determined for the first time. The genome consisted of 5 983 nucleotides, coding for five open reading frames(ORFs), had a similar genomic organization features with Pea enation mosaic virus(PEMV). Nucleotide and deduced amino acid sequence identity of the five ORFs compared to isolate CVEV VE-1 range from 97.1 to 99.0% and 97.4 to 100.0%, these values compared to isolate PEMV-1 range from 45.2 to 51.6% and 31.1 to 45.2%. Phylogenetic analysis based on the complete genome sequence showed that the isolate CVEV-XZG had close relationship with Pea enation mosaic virus. The results supports CVEV may be a new member of genus Enamovirus. The full sequence of CVEV-XZG presented here may serve as a basis for future study of CVEV in China.
基金supported by the National key research and development plan(2016TFC1202700,2016YFC1200900)Beijing Municipal Science&Technology Commission project(grant numbers D151100002115003)Guangzhou Municipal Science&Technology Commission project(grant numbers 2015B2150820)
文摘Objective Knowledge of an enterovirus genome sequence is very important in epidemiological investigation to identify transmission patterns and ascertain the extent of an outbreak. The MinION sequencer is increasingly used to sequence various viral pathogens in many clinical situations because of its long reads, portability, real-time accessibility of sequenced data, and very low initial costs. However, information is lacking on MinION sequencing of enterovirus genomes. Methods In this proof-of-concept study using Enterovirus 71 (EV71) and Coxsackievirus A16 (CA16) strains as examples, we established an amplicon-based whole genome sequencing method using MinION. We explored the accuracy, minimum sequencing time, discrimination and high-throughput sequencing ability of MinION, and compared its performance with Sanger sequencing. Results Within the first minute (min) of sequencing, the accuracy of MinION was 98.5% for the single EV71 strain and 94.12%-97.33% for 10 genetically-related CA16 strains. In as little as 14 min, 99% identity was reached for the single EV71 strain, and in 17 min (on average), 99% identity was achieved for 10 CA16 strains in a single run. Conclusion MinION is suitable for whole genome sequencing of enteroviruses with sufficient accuracy and fine discrimination and has the potential as a fast, reliable and convenient method for routine use.
基金The Key Natural Science Foundation of Fujian under contract No. 2007J0004the National Natural Science Foundation of China under contract No. 40576076
文摘Viruses of thermophiles are of great interest due to their roles in gene transfer, global geochemical cycle and evolution of life on earth. However, the thermophilic bacteriophages have not been studied extensively. In this investigation, a typical bacteriophage BV1 was obtained from a thermophilic bacterium Geobacillus sp. 6k512, which was isolated from an inshore hot spring in Xiamen of China. The BV1 contained a double-stranded linear DNA of 35 055 bp, which encodes 54 open reading frames (ORFs). Interestingly, eight of the 54 BV1 ORFs shared sequence similarities to genes from human disease-relevant bacteria. Seven proteins of the purified BV1 virions were identified by proteomic analysis. Determination of BV1 functional genomics would facilitate the better understanding of the mechanism for virus-thermophile interaction.
基金supported by the Nationai Key Technology R&D Program of China (2011BAD18B01)
文摘Mycoplasma ovipneumoniae, a kind of mycoplasma bacteria, commonly infects the respiratory tract causing respiratory disease in sheep and goats worldwide. Here, the complete genome sequence of M. ovipneumoniae strain NM2010 isolated from a sheep in China was reported for the ifrst time.
文摘The nucleotide (base) sequence of the genome might reflect biological information beyond the coding sequences. The appearance frequencies of successive base sequences (key sequences) were calculated for entire genomes. Based on the appearance frequency of the key sequences of the genome, any DNA sequences on the genome could be expressed as a sequence spectrum with the adjoining base sequences, which could be used to study the corresponding biological phenomena. In this paper, we used 64 successive three- base sequences (triplets) as the key sequences, and determined and compared the spectra of specific genes to the chromosome, or specific genes to tRNA genes in Saccharomyces cerevisiae, Schizosaccharomyces pombe and Escherichia coli. Based on these analyses, a gene and its corresponding position on the chromosome showed highly similar spectra with the same fold enlargement (approximately 400-fold) in the S. cerevisiae, S. pombe and E. coli genomes. In addition, the homologous structure of genes that encode proteins was also observed with appropriate tRNA gene(s) in the genome. This analytical method might faithfully reflect the encoded biological information, that is, the conservation of the base sequences was to make sense the conservation of the translated amino acids sequence in the coding region, and might be universally applicable to other genomes, even those that consisted of multiple chromosomes.
基金financially supported by the Collaborative Innovation Involving Production, Teaching & Research Funds of Jiangsu Province, China (BY2014023-28)
文摘Bacillus amyloliquefaciens YP6,a plant growth promoting rhizobacteria,is capable of efficiently degrading a wide range of organophosphorus pesticides(OPs).Here,we report the complete genome sequence of this bacterium with a genome size of 4009619 bp,4210 protein-coding genes and an average GC content of 45.9%.Based on the genome sequence,several genes previously described as being involved in solubilizing-phosphorus,OPs-degradation,indole-3-acetic acid(IAA)and siderophores synthesis.Interestingly,compared with the genomes of B.amyloliquefaciens species,strain YP6 had larger genome size and the most protein-coding genes.Moreover,the four categories of“cell envelope biogenesis,outer membrane(M),”“translation,ribosomal structure and biogenesis(J),”“transcription(K),”and“signal transduction mechanisms(T)”were fewer.These differences may be related to extensive environmental adaptability of the genus B.amyloliquefaciens.These results expand the application potential of strain YP6 for environmental bioremediation,provide gene resources involved in OPs degradation for biotechnology and gene engineering,and contribute to provide insights into the relationship between microorganism and living environment.
基金supported by the National Natural Science Foundation of China(32022078)the Local Innovative and Research Teams Project of Guangdong Province,China(2019BT02N630)the support from the National Supercomputer Center in Guangzhou,China。
文摘Single-step genomic best linear unbiased prediction(ss GBLUP) is now intensively investigated and widely used in livestock breeding due to its beneficial feature of combining information from both genotyped and ungenotyped individuals in the single model. With the increasing accessibility of whole-genome sequence(WGS) data at the population level, more attention is being paid to the usage of WGS data in ss GBLUP. The predictive ability of ss GBLUP using WGS data might be improved by incorporating biological knowledge from public databases. Thus, we extended ss GBLUP, incorporated genomic annotation information into the model, and evaluated them using a yellow-feathered chicken population as the examples. The chicken population consisted of 1 338 birds with 23 traits, where imputed WGS data including 5 127 612 single nucleotide polymorphisms(SNPs) are available for 895 birds. Considering different combinations of annotation information and models, original ss GBLUP, haplotype-based ss GHBLUP, and four extended ss GBLUP incorporating genomic annotation models were evaluated. Based on the genomic annotation(GRCg6a) of chickens, 3 155 524 and 94 837 SNPs were mapped to genic and exonic regions, respectively. Extended ss GBLUP using genic/exonic SNPs outperformed other models with respect to predictive ability in 15 out of 23 traits, and their advantages ranged from 2.5 to 6.1% compared with original ss GBLUP. In addition, to further enhance the performance of genomic prediction with imputed WGS data, we investigated the genotyping strategies of reference population on ss GBLUP in the chicken population. Comparing two strategies of individual selection for genotyping in the reference population, the strategy of evenly selection by family(SBF) performed slightly better than random selection in most situations. Overall, we extended genomic prediction models that can comprehensively utilize WGS data and genomic annotation information in the framework of ss GBLUP, and validated the idea that properly handling the genomic annotation information and WGS data increased the predictive ability of ss GBLUP. Moreover, while using WGS data, the genotyping strategy of maximizing the expected genetic relationship between the reference and candidate population could further improve the predictive ability of ss GBLUP. The results from this study shed light on the comprehensive usage of genomic annotation information in WGS-based single-step genomic prediction.
文摘Beach pea or beach cowpea(Vigna marina(Burm.)Merr.)belongs to the family Fabaceae.It is a close relative of cultivated Vigna species such as adzuki bean(V.angularis),cowpea(V.unguiculata),mung bean(V.radiata),and blackgram(V.mungo),and is distributed throughout the tropics.With its ability to tolerate salt stress,beach pea has great potential to contribute salt-tolerance genes for developing salt-tolerant cultivars in cultivated Vigna species.However,it is still underutilized in Vigna breeding programs.A draft genome sequence of beach pea was generated using a high-throughput next-generation sequencing platform,yielding 23.7 Gb of sequence from 79,929,868 filtered reads.A de novo genome assembly containing 68,731 scaffolds gave an N50 length of 10,272 bp and the assembled sequences totaled 365.6 Mb.A total of 35,448 SSRs,including 3574 compound SSRs,were identified and primer pairs for most of these SSRs were designed.Genome analysis identified 50,670 genes with mean coding sequence length 1042 bp.Phylogenetic analysis revealed highest sequence similarity with V.angularis,followed by V.radiata.Comparison with the V.angularis genome revealed 16,699 SNPs and 2253 InDels and comparison with the V.radiata genome revealed 17,538 SNPs and 2300 InDels.To our knowledge this is the first draft genome sequence of beach pea derived from an accession(ANBp-14-03)adapted locally in the Andaman and Nicobar Islands of India.The draft genome sequence may facilitate the genetic enhancement in cultivated Vigna species.
文摘Molecular genetic maps were commonly constructed by analyzing the segregation of restriction fragment length polymorphisms (RFLPs). Here we described methodology-marker sequences in a new mapping based on recent documents. With the methods they were unique sequences detected by the polymerase chain reaction (PCR). Each of the methods had its Iimitations and the current trend was to integrate the maps produced by the different methods. Marker sequences contained mainly expressed sequence tags (ESTs),polymorphie sequence-tagged sites (STSs), randomly amplified polymorphic DNA (RAPDs), cIeaved amplified polymorphic sequences (CAPS), amplified fragment Iength pofymorphism (AFLPs), genorne sequence sampling (GSS) and sequence-tagged connectors (STCs) in this paper.
基金the program of Key Sci-ence and Technology Research from the Department of Science and Technology of General Bureau of Land Reclamation of Heilongjiang Province, China (HNKXIV-02-03-03)
文摘Simple sequence repeats (SSRs) or microsatellites, as genetic markers, are ubiquitous in genomes of various organisms. The analysis of SSR in rhizobia genome provides useful information for a variety of applications in population genetics of rhizobia. We analyzed the occurrences, relative abundance, and relative density of SSRs, the most common in Bradyrhizobium japonicum, Mesorhizobium loti, and Sinorhizobium meliloti genomes se- quenced in the microorganisms tandem repeats database, and SSRs in the three species genomes were compared with each other. The result showed that there were 1 410, 859, and 638 SSRs in B. japonicum, M. loti, and S. meliloti genomes, respectively. In the genomes of B. japonicum, M. loti, and S. meliloti, tetranucleotide, pentanucleotide, and hexanucleotide repeats were more abundant and indicated higher mutation rates in these species. The least abundance was mononucleotide repeat. The SSRs type and distribution were similar among these species.
文摘Genomics has become a ground-breaking field in all areas of the life sciences. The advanced genomics and the development of high-throughput techniques have lately provided insight into whole-genome characterization of a wide range of organisms. In the post-genomic era, new technologies have revealed an outbreak of prerequisite genomic sequences and supporting data to understand genome wide functional regulation of gene expression and metabolic pathways reconstruction. However, the availability of this plethora of genomic data presents a significant challenge for storage, analyses and data management. Analysis of this mega-data requires the development and application of novel bioinformatics tools that must include unified functional annotation, structural search, and comprehensive analysis and identification of new genes in a wide range of species with fully sequenced genomes. In addition, generation of systematically and syntactically unambiguous nomenclature systems for genomic data across species is a crucial task. Such systems are necessary for adequate handling genetic information in the context of comparative functional genomics. In this paper, we provide an overview of major advances in bioinformatics and computational biology in genome sequencing and next-generation sequence data analysis. We focus on their potential applications for efficient collection, storage, and analysis of genetic data/information from a wide range of gene banks. We also discuss the importance of establishing a unified nomenclature system through a functional and structural genomics approach.
文摘Herein, we report a very high content of simple sequence repeats (SSRs) covering 66.12% of the herpes simplex virus type 1 (HSV-1) genome when a low threshold is adopted to define SSRs, indicating that repeat sequence is a very important character of the HSV-1 genome. The repeats with two iterations account for 68.33% of the total repeats. In reality, the genome of HSV-1 is prone to form shorter repeat sequences. For mono-, di- and trinucleotide repeats, the repeat numbers decreased with the increase of repeats iterations, implicating that the formation tendency of SSRs might be from low iterations to high iterations. The high iterations SSRs might have subjected to strong selected pressure and survived to perform different functions. The analysis suggested that the repeats formation may be an essential evolutionary driving force for the HSV-1 genome, and the results might be helpful for studying the genome structure, repeats genesis and genome evolution of HSV-1.
基金Taif University Researchers are supporting project number(TURSP-2020/211),Taif University,Taif,Saudi Arabia.
文摘The study of viruses and their genetics has been an opportunity as well as a challenge for the scientific community.The recent ongoing SARSCov2(Severe Acute Respiratory Syndrome)pandemic proved the unpreparedness for these situations.Not only the countermeasures for the effect caused by virus need to be tackled but the mutation taking place in the very genome of the virus is needed to be kept in check frequently.One major way to find out more information about such pathogens is by extracting the genetic data of such viruses.Though genetic data of viruses have been cultured and stored as well as isolated in form of their genome sequences,there is still limited methods on what new viruses can be generated in future due to mutation.This research proposes a deep learning model to predict the genome sequences of the SARS-Cov2 virus using only the previous viruses of the coronaviridae family with the help of RNN-LSTM(Recurrent Neural Network-Long ShortTerm Memory)and RNN-GRU(Gated Recurrent Unit)so that in the future,several counter measures can be taken by predicting possible changes in the genome with the help of existing mutations in the virus.After the process of testing the model,the F1-recall came out to be more than 0.95.The mutation detection’s accuracy of both the models come out about 98.5%which shows the capability of the recurrent neural network to predict future changes in the genome of virus.