Citrus leaf blotch virus (CLBV) is a member of the genus Citrivirus, in the family Betaflexiviridae. It has been reported CLBV could infect kiwi, citrus and sweet cherry in China. Of 289 citrus samples from six regi...Citrus leaf blotch virus (CLBV) is a member of the genus Citrivirus, in the family Betaflexiviridae. It has been reported CLBV could infect kiwi, citrus and sweet cherry in China. Of 289 citrus samples from six regions of China, 15 were detected to be infected with CLBV in this study. The complete genome of four isolates of CLBV was obtained from Reikou in Sichuan (CLBV-LH), Yura Wase in Zhejiang (CLBV-YL), Bingtangcheng in Hunan (CLBV-BT), Fengjie 72-1 in Chongqing (CLBV- F J), respectively. While they all represented 8 747 nucleotides in monopartite size, excluding the poly(A) tail, each of the isolates coded three open reading frames (ORFs). Identity of the four isolates ranged from 98.9 to 99.8% to each other and from 96.8 to 98.1% to the citrus references in GenBank by multiple alignment of genomes. A phylogenetic tree based on the genome sequences of available CLBV isolates indicated that the four isolates were clustered together, suggesting that CLBV isolates from citrus in China did not have obvious variation. This is the first report of the complete nucleotide sequences of CLBV isolates infecting citrus in China.展开更多
Mycoplasma ovipneumoniae, a kind of mycoplasma bacteria, commonly infects the respiratory tract causing respiratory disease in sheep and goats worldwide. Here, the complete genome sequence of M. ovipneumoniae strain N...Mycoplasma ovipneumoniae, a kind of mycoplasma bacteria, commonly infects the respiratory tract causing respiratory disease in sheep and goats worldwide. Here, the complete genome sequence of M. ovipneumoniae strain NM2010 isolated from a sheep in China was reported for the ifrst time.展开更多
The complete nucleotide sequence of an isolate of Citrus vein enation virus(CVEV-XZG) from China has been determined for the first time. The genome consisted of 5 983 nucleotides, coding for five open reading frames...The complete nucleotide sequence of an isolate of Citrus vein enation virus(CVEV-XZG) from China has been determined for the first time. The genome consisted of 5 983 nucleotides, coding for five open reading frames(ORFs), had a similar genomic organization features with Pea enation mosaic virus(PEMV). Nucleotide and deduced amino acid sequence identity of the five ORFs compared to isolate CVEV VE-1 range from 97.1 to 99.0% and 97.4 to 100.0%, these values compared to isolate PEMV-1 range from 45.2 to 51.6% and 31.1 to 45.2%. Phylogenetic analysis based on the complete genome sequence showed that the isolate CVEV-XZG had close relationship with Pea enation mosaic virus. The results supports CVEV may be a new member of genus Enamovirus. The full sequence of CVEV-XZG presented here may serve as a basis for future study of CVEV in China.展开更多
Viruses of thermophiles are of great interest due to their roles in gene transfer, global geochemical cycle and evolution of life on earth. However, the thermophilic bacteriophages have not been studied extensively. I...Viruses of thermophiles are of great interest due to their roles in gene transfer, global geochemical cycle and evolution of life on earth. However, the thermophilic bacteriophages have not been studied extensively. In this investigation, a typical bacteriophage BV1 was obtained from a thermophilic bacterium Geobacillus sp. 6k512, which was isolated from an inshore hot spring in Xiamen of China. The BV1 contained a double-stranded linear DNA of 35 055 bp, which encodes 54 open reading frames (ORFs). Interestingly, eight of the 54 BV1 ORFs shared sequence similarities to genes from human disease-relevant bacteria. Seven proteins of the purified BV1 virions were identified by proteomic analysis. Determination of BV1 functional genomics would facilitate the better understanding of the mechanism for virus-thermophile interaction.展开更多
Background Breed identification is useful in a variety of biological contexts.Breed identification usually involves two stages,i.e.,detection of breed-informative SNPs and breed assignment.For both stages,there are se...Background Breed identification is useful in a variety of biological contexts.Breed identification usually involves two stages,i.e.,detection of breed-informative SNPs and breed assignment.For both stages,there are several methods proposed.However,what is the optimal combination of these methods remain unclear.In this study,using the whole genome sequence data available for 13 cattle breeds from Run 8 of the 1,000 Bull Genomes Project,we compared the combinations of three methods(Delta,FST,and In)for breed-informative SNP detection and five machine learning methods(KNN,SVM,RF,NB,and ANN)for breed assignment with respect to different reference population sizes and difference numbers of most breed-informative SNPs.In addition,we evaluated the accuracy of breed identification using SNP chip data of different densities.Results We found that all combinations performed quite well with identification accuracies over 95%in all scenarios.However,there was no combination which performed the best and robust across all scenarios.We proposed to inte-grate the three breed-informative detection methods,named DFI,and integrate the three machine learning methods,KNN,SVM,and RF,named KSR.We found that the combination of these two integrated methods outperformed the other combinations with accuracies over 99%in most cases and was very robust in all scenarios.The accuracies from using SNP chip data were only slightly lower than that from using sequence data in most cases.Conclusions The current study showed that the combination of DFI and KSR was the optimal strategy.Using sequence data resulted in higher accuracies than using chip data in most cases.However,the differences were gener-ally small.In view of the cost of genotyping,using chip data is also a good option for breed identification.展开更多
Background Pork quality can directly affect customer purchase tendency and meat quality traits have become valu-able in modern pork production.However,genetic improvement has been slow due to high phenotyping costs.In...Background Pork quality can directly affect customer purchase tendency and meat quality traits have become valu-able in modern pork production.However,genetic improvement has been slow due to high phenotyping costs.In this study,whole genome sequence(WGS)data was used to evaluate the prediction accuracy of genomic best linear unbiased prediction(GBLUP)for meat quality in large-scale crossbred commercial pigs.Results We produced WGS data(18,695,907 SNPs and 2,106,902 INDELs exceed quality control)from 1,469 sequenced Duroc×(Landrace×Yorkshire)pigs and developed a reference panel for meat quality including meat color score,marbling score,L*(lightness),a*(redness),and b*(yellowness)of genomic prediction.The prediction accuracy was defined as the Pearson correlation coefficient between adjusted phenotypes and genomic estimated breeding values in the validation population.Using different marker density panels derived from WGS data,accuracy differed substantially among meat quality traits,varied from 0.08 to 0.47.Results showed that MultiBLUP outperform GBLUP and yielded accuracy increases ranging from 17.39%to 75%.We optimized the marker density and found medium-and high-density marker panels are beneficial for the estimation of heritability for meat quality.Moreover,we conducted genotype imputation from 50K chip to WGS level in the same population and found average concord-ance rate to exceed 95%and r^(2)=0.81.Conclusions Overall,estimation of heritability for meat quality traits can benefit from the use of WGS data.This study showed the superiority of using WGS data to genetically improve pork quality in genomic prediction.展开更多
The study of viruses and their genetics has been an opportunity as well as a challenge for the scientific community.The recent ongoing SARSCov2(Severe Acute Respiratory Syndrome)pandemic proved the unpreparedness for ...The study of viruses and their genetics has been an opportunity as well as a challenge for the scientific community.The recent ongoing SARSCov2(Severe Acute Respiratory Syndrome)pandemic proved the unpreparedness for these situations.Not only the countermeasures for the effect caused by virus need to be tackled but the mutation taking place in the very genome of the virus is needed to be kept in check frequently.One major way to find out more information about such pathogens is by extracting the genetic data of such viruses.Though genetic data of viruses have been cultured and stored as well as isolated in form of their genome sequences,there is still limited methods on what new viruses can be generated in future due to mutation.This research proposes a deep learning model to predict the genome sequences of the SARS-Cov2 virus using only the previous viruses of the coronaviridae family with the help of RNN-LSTM(Recurrent Neural Network-Long ShortTerm Memory)and RNN-GRU(Gated Recurrent Unit)so that in the future,several counter measures can be taken by predicting possible changes in the genome with the help of existing mutations in the virus.After the process of testing the model,the F1-recall came out to be more than 0.95.The mutation detection’s accuracy of both the models come out about 98.5%which shows the capability of the recurrent neural network to predict future changes in the genome of virus.展开更多
In this study, we determined the complete nucleotide and deduced amino acid sequence of a primary isolate of rabies virus (SH06) obtained from the brain of a rabid dog. The overall length of the genome was 11 924 nucl...In this study, we determined the complete nucleotide and deduced amino acid sequence of a primary isolate of rabies virus (SH06) obtained from the brain of a rabid dog. The overall length of the genome was 11 924 nucleotides. Comparison of the genomic sequence showed the homology of SH06 at nucleotide level with full-length genomes of reference vaccine strains ranged from 82.2% with the PV strain to 86.9% with the CTN strain. A full-length genome-based phylogenetic analysis was performed with sequences available from GenBank. Phylogenetic analysis of the complete genome sequences indicated that the SH06 exhibited the highest homology with rabies street virus BD06 and CTN vaccine strain originated from China.展开更多
With the publication of "Toward Sequencing Cotton(Gossypium) Genomes" [Chen et al.Plant Physiology,2007,145:1303-1310] a clear consensus emerged from the cotton genomics
The Genome Sequence Archive(GSA)is a data repository for archiving raw sequence data,which provides data storage and sharing services for worldwide scientific communities.Considering explosive data growth with diverse...The Genome Sequence Archive(GSA)is a data repository for archiving raw sequence data,which provides data storage and sharing services for worldwide scientific communities.Considering explosive data growth with diverse data types,here we present the GSA family by expanding into a set of resources for raw data archive with different purposes,namely,GSA(https://ngdc.cncb.ac.cn/gsa/),GSA for Human(GSA-Human,https://ngdc.cncb.ac.cn/gsa-human/),and Open Archive for Miscellaneous Data(OMIX,https://ngdc.cncb.ac.cn/omix/).Compared with the 2017 version,GSA has been significantly updated in data model,online functionalities,and web interfaces.GSA-Human,as a new partner of GSA,is a data repository specialized in human genetics-related data with controlled access and security.OMIX,as a critical complement to the two resources mentioned above,is an open archive for miscellaneous data.Together,all these resources form a family of resources dedicated to archiving explosive data with diverse types,accepting data submissions from all over the world,and providing free open access to all publicly available data in support of worldwide research activities.展开更多
With the rapid development of sequencing technologies towards higher throughput and lower cost, sequence data are generated at an unprecedentedly explosive rate. To provide an efficient and easy-to-use platform for ma...With the rapid development of sequencing technologies towards higher throughput and lower cost, sequence data are generated at an unprecedentedly explosive rate. To provide an efficient and easy-to-use platform for managing huge sequence data, here we present Genome Sequence Archive (GSA; http://bigd.big.ac.cn/gsa or http://gsa.big.ac.cn), a data repository for archiving raw sequence data. In compliance with data standards and structures of the International Nucleotide Sequence Database Collaboration (INSDC), GSA adopts four data objects (BioProject, BioSample, Experiment, and Run) for data organization, accepts raw sequence reads produced by a variety of sequencing platforms, stores both sequence reads and metadata submitted from all over the world, and makes all these data publicly available to worldwide scientific communities. In the era of big data, GSA is not only an important complement to existing INSDC members by alleviating the increasing burdens of handling sequence data deluge, but also takes the significant responsibility for global big data archive and provides free unrestricted access to all publicly available data in support of research activities throughout the world.展开更多
Long-PCR amplification, clone and primer-walking sequencing methods were employed in determine the complete sequence of mitochondrial genome of tokay (Gekko gecko). The genome is 16 435 bp in size, contains 13 protein...Long-PCR amplification, clone and primer-walking sequencing methods were employed in determine the complete sequence of mitochondrial genome of tokay (Gekko gecko). The genome is 16 435 bp in size, contains 13 protein-coding, 2 ribosomal and 22 transfer RNA genes. The mt genome of Gekko is similar to most of the vertebrates in gene components, order, orientation, tRNA structures, low percentage of guanine and high percentage of thymine, and skews of base GC and AT. Base A was preferred at third codon positions for protein genes is similar to amphibians and fishes rather than amnion vertebrates. The standard stop codes (TAA) present only in three protein genes, less than those of most vertebrates. Transfer RNA genes range in length from 63 to 76 nt, their planar structure present characteristic clover leaf, except for tRNA-Cys and tRNA-Ser (AGY) because of lacking the D arm.展开更多
Pathogenic Escherichia coli cause chicken colibacillosis, which is economically devastating to the poultry in- dustry worldwide (Bagheri et al., 2014). Owing to in- creasing antibiotic resistance, phage therapy reag...Pathogenic Escherichia coli cause chicken colibacillosis, which is economically devastating to the poultry in- dustry worldwide (Bagheri et al., 2014). Owing to in- creasing antibiotic resistance, phage therapy reagents have been developed to treat bacterial infections (Xu et al., 2015).展开更多
Enterococci bacteria are important in environmental, food and clinical microbiology. Enterococcus faecium is a nosocomial pathogen that causes bacteremia, endocarditis and other infections. It is among the most preval...Enterococci bacteria are important in environmental, food and clinical microbiology. Enterococcus faecium is a nosocomial pathogen that causes bacteremia, endocarditis and other infections. It is among the most prevalent organisms encountered in hospital-associated infections accounting for approximately 12% of nosocomial infections in the USA (Linden and Miller, 1999). However, certain strains of E. faecium are not only non-pathogenic but also have beneficial effects on human health with probiotic potential. For example, E. faecium T-110 is a consortium member in several probiotic products including BIO-THREE~ which is widely prescribed for human, animal and aqua-cultural use. This strain was originally developed by TOA Pharmaceuticals in Japan, and later used in the probiotic products of several other companies.展开更多
The complete genome of methanol-utilizing Amycolatopsis methanolica strain 239T was generated,revealing a single 7,237,391 nucleotide circular chromosome with 7074 annotated protein-coding sequences(CDSs).Comparative ...The complete genome of methanol-utilizing Amycolatopsis methanolica strain 239T was generated,revealing a single 7,237,391 nucleotide circular chromosome with 7074 annotated protein-coding sequences(CDSs).Comparative analyses against the complete genome sequences of Amycolatopsis japonica strain MG417-CF17T,Amycolatopsis mediterranei strain U32 and Amycolatopsis orientalis strain HCCB10007 revealed a broad spectrum of genomic structures,including various genome sizes,core/quasi-core/non-core configurations and different kinds of episomes.Although polyketide synthase gene clusters were absent from the A.methanolica genome,12 gene clusters related to the biosynthesis of other specialized(secondary)metabolites were identified.Complete pathways attributable to the facultative methylotrophic physiology of A.methanolica strain 239T,including both the mdo/mscR encoded methanol oxidation and the hps/hpi encoded formaldehyde assimilation via the ribulose monophosphate cycle,were identified together with evidence that the latter might be the result of horizontal gene transfer.Phylogenetic analyses based on 16S rDNA or orthologues of AMETH_3452,a novel actinobacterial class-specific conserved gene against 62 or 18 Amycolatopsis type strains,respectively,revealed three major phyletic lineages,namely the mesophilic or moderately thermophilic A.orientalis subclade(AOS),the mesophilic Amycolatopsis taiwanensis subclade(ATS)and the thermophilic A.methanolica subclade(AMS).The distinct growth temperatures of members of the subclades correlated with corresponding genetic variations in their encoded compatible solutes.This study shows the value of integrating conventional taxonomic with whole genome sequence data.展开更多
Two decades have passed since the first bacterial whole-genome sequencing, which provides new opportunity for microbial genome. Consequently, considerable genetic diversity encoded by bacterial genomes and among the s...Two decades have passed since the first bacterial whole-genome sequencing, which provides new opportunity for microbial genome. Consequently, considerable genetic diversity encoded by bacterial genomes and among the strains in the same species has been revealed. In recent years, genome sequencing techniques and bioinformatics have developed rapidly, which has resulted in transformation and expedited the application of strategy and methodology for bacterial genome comparison used in dissection of infectious disease epidemics. Bacterial whole-genome sequencing and bioinformatic computing allow genotyping to satisfy the requirements of epidemiological study in disease control. In this review, we outline the significance and summarize the roles of bacterial genome sequencing in the context of bacterial disease control and prevention. We discuss the applications of bacterial genome sequencing in outbreak detection, source tracing, transmission mode discovery, and new epidemic clone identification. Wide applications of genome sequencing and data sharing in infectious disease surveillance networks will considerably promote outbreak detection and early warning to prevent the dissemination of bacterial diseases.展开更多
Objective Knowledge of an enterovirus genome sequence is very important in epidemiological investigation to identify transmission patterns and ascertain the extent of an outbreak. The MinION sequencer is increasingly ...Objective Knowledge of an enterovirus genome sequence is very important in epidemiological investigation to identify transmission patterns and ascertain the extent of an outbreak. The MinION sequencer is increasingly used to sequence various viral pathogens in many clinical situations because of its long reads, portability, real-time accessibility of sequenced data, and very low initial costs. However, information is lacking on MinION sequencing of enterovirus genomes. Methods In this proof-of-concept study using Enterovirus 71 (EV71) and Coxsackievirus A16 (CA16) strains as examples, we established an amplicon-based whole genome sequencing method using MinION. We explored the accuracy, minimum sequencing time, discrimination and high-throughput sequencing ability of MinION, and compared its performance with Sanger sequencing. Results Within the first minute (min) of sequencing, the accuracy of MinION was 98.5% for the single EV71 strain and 94.12%-97.33% for 10 genetically-related CA16 strains. In as little as 14 min, 99% identity was reached for the single EV71 strain, and in 17 min (on average), 99% identity was achieved for 10 CA16 strains in a single run. Conclusion MinION is suitable for whole genome sequencing of enteroviruses with sufficient accuracy and fine discrimination and has the potential as a fast, reliable and convenient method for routine use.展开更多
Influenza A(H3N2)virus has a faster evolution rate than other types of influenza viruses.In this study,whole genome sequencing was performed to better understand themolecular evolution of influenzaH3N2 and the protect...Influenza A(H3N2)virus has a faster evolution rate than other types of influenza viruses.In this study,whole genome sequencing was performed to better understand themolecular evolution of influenzaH3N2 and the protective effect of influenza virus vaccine in Qinghai Province,China,in 2017.Complete sequences of eight gene segments of two seasonal influenza H3N2 isolates were sequenced and analyzed using DNASTAR and MEGA 6.06 software.Additionally,the three-dimensional structure of the HA protein was predicted using the SWISS-MODEL server.Phylogenetic and amino acid sequence analysis revealed that two Qinghai H3N2 isolates were typical low-pathogenic influenza viruses,and were relatively closely related to the 2016–2017 vaccine strain,3C.2a-A/Hong Kong/4801/2014.The presence of several antigenic site substitutions(T131K,G/R142K,K160T and R261Q in the HA protein)were specific for the two Qinghai H3N2 virus strains.In addition,amino acid substitution of K160T at the glycosylation site of HA and H75P in PB1-F2 in Qinghai isolatesmight affect the antibody binding ability and virulence of the influenza virus.The presence of several antigenic site mutations in the Qinghai H3N2 isolates confirmed the evolution of circulating H3N2 strains.展开更多
Coding sequences (CDS) are commonly used for transient gene expression, in yeast two-hybrid screening, to verify protein interactions and in prokaryotic gene expression studies. CDS are most commonly obtained using co...Coding sequences (CDS) are commonly used for transient gene expression, in yeast two-hybrid screening, to verify protein interactions and in prokaryotic gene expression studies. CDS are most commonly obtained using complementary DNA (cDNA) derived from messenger RNA (mRNA) extracted from plant tissues and generated by reverse transcription. However, some CDS are difficult to acquire through this process as they are expressed at extremely low levels or have specific spatial and/or temporal expression patterns in vivo. These challenges require the development of alternative CDS cloning technologies. In this study, we found that the genomic intron-containing gene coding sequences (gDNA) from Arabidopsis thaliana, Oryza sativa, Brassica napus, and Glycine max can be correctly transcribed and spliced into mRNA in Nicotiana benthamiana. In contrast, gDNAs from Triticum aestivum and Sorghum bicolor did not function correctly. In transient expression experiments, the target DNA sequence is driven by a constitutive promoter. Theoretically, a sufficient amount of mRNA can be extracted from the N. benthamiana leaves, making it conducive to the cloning of CDS target genes. Our data demonstrate that N. benthamiana can be used as an effective host for the cloning CDS of plant genes.展开更多
Introduction: Omicron is a highly divergent variant of concern (VOCs) of a severe acute respiratory syndrome SARS-CoV-2. It carries a high number of mutations in its spike protein hence;it is more transmissible in the...Introduction: Omicron is a highly divergent variant of concern (VOCs) of a severe acute respiratory syndrome SARS-CoV-2. It carries a high number of mutations in its spike protein hence;it is more transmissible in the community by immune evasion mechanisms. Due to mutation within S gene, most Omicron variants have reported S gene target failure (SGTF) with some commercially available PCR kits. Such diagnostic features can be used as markers to screen Omicron. However, Whole Genome Sequencing (WGS) is the only gold standard approach to confirm novel microorganisms at genetically level as similar mutations can also be found in other variants that are circulating at low frequencies worldwide. This Retrospective study is aimed to assess RT-PCR sensitivity in the detection of S gene target failure in comparison with whole genome sequencing to detect variants of Omicron. Methods: We have analysed retrospective data of SARS-CoV-2 positive RT-PCR samples for S gene target failure (SGTF) with TaqPath COVID-19 RT-PCR Combo Kit (ThermoFisher) and combined with sequencing technologies to study the emerged pattern of SARS-CoV-2 variants during third wave at the tertiary care centre, Surat. Results: From the first day of December 2021 till the end of February 2022, a total of 321,803 diagnostic RT-PCR tests for SARS-CoV-2 were performed, of which 20,566 positive cases were reported at our tertiary care centre with an average cumulative positivity of 6.39% over a period of three months. In the month of December 21 samples characterized by the SGTF (70/129) were suggestive of being infected by the Omicron variant and identified as Omicron (B.1.1.529 lineage) when sequence. In the month of January, we analysed a subset of samples (n = 618) with SGTF (24%) and without SGTF (76%) with Ct values Conclusions: During the COVID-19 pandemic, it took almost more than 15 days to diagnose infection and identify pathogen by sequencing technology. In contrast to that molecular assay provided quick identification with the help of SGTF phenomenon within 5 hours of duration. This strategy helps scientists and health policymakers for the quick isolation and identification of clusters. That ultimately results in a decreased transmission of pathogen among the community.展开更多
基金supported by the National Natural Science Foundation of China (31501611)the Chongqing Research Program of Basic Research and Frontier Technology, China (cstc2017jcyjB X0016)+2 种基金the Chongqing Science and Technology Commission Project, China (cstc2016shmsztzx80003)the Fundamental Research Funds for the Central Universities, China (XDJK2016B21, SWU116012)the Special Fund for Agro-scientific Research in the Public Interest, China (201203076-01)
文摘Citrus leaf blotch virus (CLBV) is a member of the genus Citrivirus, in the family Betaflexiviridae. It has been reported CLBV could infect kiwi, citrus and sweet cherry in China. Of 289 citrus samples from six regions of China, 15 were detected to be infected with CLBV in this study. The complete genome of four isolates of CLBV was obtained from Reikou in Sichuan (CLBV-LH), Yura Wase in Zhejiang (CLBV-YL), Bingtangcheng in Hunan (CLBV-BT), Fengjie 72-1 in Chongqing (CLBV- F J), respectively. While they all represented 8 747 nucleotides in monopartite size, excluding the poly(A) tail, each of the isolates coded three open reading frames (ORFs). Identity of the four isolates ranged from 98.9 to 99.8% to each other and from 96.8 to 98.1% to the citrus references in GenBank by multiple alignment of genomes. A phylogenetic tree based on the genome sequences of available CLBV isolates indicated that the four isolates were clustered together, suggesting that CLBV isolates from citrus in China did not have obvious variation. This is the first report of the complete nucleotide sequences of CLBV isolates infecting citrus in China.
基金supported by the Nationai Key Technology R&D Program of China (2011BAD18B01)
文摘Mycoplasma ovipneumoniae, a kind of mycoplasma bacteria, commonly infects the respiratory tract causing respiratory disease in sheep and goats worldwide. Here, the complete genome sequence of M. ovipneumoniae strain NM2010 isolated from a sheep in China was reported for the ifrst time.
基金funded by the Chongqing Natural Science Foundation Project, China (cstc2011jj A80025)
文摘The complete nucleotide sequence of an isolate of Citrus vein enation virus(CVEV-XZG) from China has been determined for the first time. The genome consisted of 5 983 nucleotides, coding for five open reading frames(ORFs), had a similar genomic organization features with Pea enation mosaic virus(PEMV). Nucleotide and deduced amino acid sequence identity of the five ORFs compared to isolate CVEV VE-1 range from 97.1 to 99.0% and 97.4 to 100.0%, these values compared to isolate PEMV-1 range from 45.2 to 51.6% and 31.1 to 45.2%. Phylogenetic analysis based on the complete genome sequence showed that the isolate CVEV-XZG had close relationship with Pea enation mosaic virus. The results supports CVEV may be a new member of genus Enamovirus. The full sequence of CVEV-XZG presented here may serve as a basis for future study of CVEV in China.
基金The Key Natural Science Foundation of Fujian under contract No. 2007J0004the National Natural Science Foundation of China under contract No. 40576076
文摘Viruses of thermophiles are of great interest due to their roles in gene transfer, global geochemical cycle and evolution of life on earth. However, the thermophilic bacteriophages have not been studied extensively. In this investigation, a typical bacteriophage BV1 was obtained from a thermophilic bacterium Geobacillus sp. 6k512, which was isolated from an inshore hot spring in Xiamen of China. The BV1 contained a double-stranded linear DNA of 35 055 bp, which encodes 54 open reading frames (ORFs). Interestingly, eight of the 54 BV1 ORFs shared sequence similarities to genes from human disease-relevant bacteria. Seven proteins of the purified BV1 virions were identified by proteomic analysis. Determination of BV1 functional genomics would facilitate the better understanding of the mechanism for virus-thermophile interaction.
基金funded by National Key Research and Development Program of China(2021YFD1200404)the Yangzhou University Interdisciplinary Research Foundation for Animal Science Discipline of Targeted Support(yzuxk202016)the Project of Genetic Improvement for Agricultural Species(Dairy Cattle)of Shandong Province(2019LZGC011).
文摘Background Breed identification is useful in a variety of biological contexts.Breed identification usually involves two stages,i.e.,detection of breed-informative SNPs and breed assignment.For both stages,there are several methods proposed.However,what is the optimal combination of these methods remain unclear.In this study,using the whole genome sequence data available for 13 cattle breeds from Run 8 of the 1,000 Bull Genomes Project,we compared the combinations of three methods(Delta,FST,and In)for breed-informative SNP detection and five machine learning methods(KNN,SVM,RF,NB,and ANN)for breed assignment with respect to different reference population sizes and difference numbers of most breed-informative SNPs.In addition,we evaluated the accuracy of breed identification using SNP chip data of different densities.Results We found that all combinations performed quite well with identification accuracies over 95%in all scenarios.However,there was no combination which performed the best and robust across all scenarios.We proposed to inte-grate the three breed-informative detection methods,named DFI,and integrate the three machine learning methods,KNN,SVM,and RF,named KSR.We found that the combination of these two integrated methods outperformed the other combinations with accuracies over 99%in most cases and was very robust in all scenarios.The accuracies from using SNP chip data were only slightly lower than that from using sequence data in most cases.Conclusions The current study showed that the combination of DFI and KSR was the optimal strategy.Using sequence data resulted in higher accuracies than using chip data in most cases.However,the differences were gener-ally small.In view of the cost of genotyping,using chip data is also a good option for breed identification.
基金supported by a Technical Innovation of Crossbred in Swine and Breed High Fertility Lines Project(2022B0202090002)a Local Innovative and Research Teams Project of Guangdong Province(2019BT02N630)+1 种基金a Natural Science Foundation of Guangdong Province project(2018B030313011)Innovative Teams of Modern Agriculture and Industry Technology System of Guangdong Province(2022KJ26).
文摘Background Pork quality can directly affect customer purchase tendency and meat quality traits have become valu-able in modern pork production.However,genetic improvement has been slow due to high phenotyping costs.In this study,whole genome sequence(WGS)data was used to evaluate the prediction accuracy of genomic best linear unbiased prediction(GBLUP)for meat quality in large-scale crossbred commercial pigs.Results We produced WGS data(18,695,907 SNPs and 2,106,902 INDELs exceed quality control)from 1,469 sequenced Duroc×(Landrace×Yorkshire)pigs and developed a reference panel for meat quality including meat color score,marbling score,L*(lightness),a*(redness),and b*(yellowness)of genomic prediction.The prediction accuracy was defined as the Pearson correlation coefficient between adjusted phenotypes and genomic estimated breeding values in the validation population.Using different marker density panels derived from WGS data,accuracy differed substantially among meat quality traits,varied from 0.08 to 0.47.Results showed that MultiBLUP outperform GBLUP and yielded accuracy increases ranging from 17.39%to 75%.We optimized the marker density and found medium-and high-density marker panels are beneficial for the estimation of heritability for meat quality.Moreover,we conducted genotype imputation from 50K chip to WGS level in the same population and found average concord-ance rate to exceed 95%and r^(2)=0.81.Conclusions Overall,estimation of heritability for meat quality traits can benefit from the use of WGS data.This study showed the superiority of using WGS data to genetically improve pork quality in genomic prediction.
基金Taif University Researchers are supporting project number(TURSP-2020/211),Taif University,Taif,Saudi Arabia.
文摘The study of viruses and their genetics has been an opportunity as well as a challenge for the scientific community.The recent ongoing SARSCov2(Severe Acute Respiratory Syndrome)pandemic proved the unpreparedness for these situations.Not only the countermeasures for the effect caused by virus need to be tackled but the mutation taking place in the very genome of the virus is needed to be kept in check frequently.One major way to find out more information about such pathogens is by extracting the genetic data of such viruses.Though genetic data of viruses have been cultured and stored as well as isolated in form of their genome sequences,there is still limited methods on what new viruses can be generated in future due to mutation.This research proposes a deep learning model to predict the genome sequences of the SARS-Cov2 virus using only the previous viruses of the coronaviridae family with the help of RNN-LSTM(Recurrent Neural Network-Long ShortTerm Memory)and RNN-GRU(Gated Recurrent Unit)so that in the future,several counter measures can be taken by predicting possible changes in the genome with the help of existing mutations in the virus.After the process of testing the model,the F1-recall came out to be more than 0.95.The mutation detection’s accuracy of both the models come out about 98.5%which shows the capability of the recurrent neural network to predict future changes in the genome of virus.
基金National High-Tech Research and Development Program of China (2007AA022402)
文摘In this study, we determined the complete nucleotide and deduced amino acid sequence of a primary isolate of rabies virus (SH06) obtained from the brain of a rabid dog. The overall length of the genome was 11 924 nucleotides. Comparison of the genomic sequence showed the homology of SH06 at nucleotide level with full-length genomes of reference vaccine strains ranged from 82.2% with the PV strain to 86.9% with the CTN strain. A full-length genome-based phylogenetic analysis was performed with sequences available from GenBank. Phylogenetic analysis of the complete genome sequences indicated that the SH06 exhibited the highest homology with rabies street virus BD06 and CTN vaccine strain originated from China.
文摘With the publication of "Toward Sequencing Cotton(Gossypium) Genomes" [Chen et al.Plant Physiology,2007,145:1303-1310] a clear consensus emerged from the cotton genomics
基金supported by grants from National Key R&D Program of China(Grant No.2017YFC0907502 to ZZ)Strategic Priority Research Program of Chinese Academy of Sciences(Grant Nos.XDB38060100 and XDB38030200 to YB+13 种基金XDB38050300 to WZXDB38030400 to JXXDA19050302 to ZZ)National Key R&D Program of China(Grant Nos.2016YFC0901603 to WZ2017YFC1201202 to YW2020YFC0847000 and 2018YFD1000505 to WZ2016YFE0206600 to YB)The 13th Five-year Informatization Plan of Chinese Academy of Sciences(Grant No.XXH13505-05 to YB)Genomics Data Center Construction of Chinese Academy of Sciences(Grant No.XXH-13514-0202 to YB)Open Biodiversity and Health Big Data Programme of the International Union of Biological Sciences to YBThe Professional Association of the Alliance of International Science Organizations(Grant No.ANSO-PA-2020-07 to YB)National Natural Science Foundation of China(Grant Nos.32030021 and 31871328 to ZZ)International Partnership Program of the Chinese Academy of Sciences(Grant No.153F11KYSB20160008 to ZZ)。
文摘The Genome Sequence Archive(GSA)is a data repository for archiving raw sequence data,which provides data storage and sharing services for worldwide scientific communities.Considering explosive data growth with diverse data types,here we present the GSA family by expanding into a set of resources for raw data archive with different purposes,namely,GSA(https://ngdc.cncb.ac.cn/gsa/),GSA for Human(GSA-Human,https://ngdc.cncb.ac.cn/gsa-human/),and Open Archive for Miscellaneous Data(OMIX,https://ngdc.cncb.ac.cn/omix/).Compared with the 2017 version,GSA has been significantly updated in data model,online functionalities,and web interfaces.GSA-Human,as a new partner of GSA,is a data repository specialized in human genetics-related data with controlled access and security.OMIX,as a critical complement to the two resources mentioned above,is an open archive for miscellaneous data.Together,all these resources form a family of resources dedicated to archiving explosive data with diverse types,accepting data submissions from all over the world,and providing free open access to all publicly available data in support of worldwide research activities.
基金supported by grants from the Strategic Priority Research Program of the Chinese Academy of Sciences(Grant Nos.XDB13040500 and XDA08020102)the National High-tech R&D Program(863 Program+5 种基金Grant Nos.2014AA021503 and 2015AA020108)the National Key Research Program of China(Grant Nos.2016YFC0901603,2016YFB0201702,2016YFC0901903,and 2016YFC0901701)the International Partnership Program of the Chinese Academy of Sciences(Grant No.153F11KYSB20160008)the Key Program of the Chinese Academy of Sciences(Grant No.KJZD-EW-L14)the Key Technology Talent Program of the Chinese Academy of Sciences(awarded to WZ)the 100 Talent Program of the Chinese Academy of Sciences(awarded to ZZ)
文摘With the rapid development of sequencing technologies towards higher throughput and lower cost, sequence data are generated at an unprecedentedly explosive rate. To provide an efficient and easy-to-use platform for managing huge sequence data, here we present Genome Sequence Archive (GSA; http://bigd.big.ac.cn/gsa or http://gsa.big.ac.cn), a data repository for archiving raw sequence data. In compliance with data standards and structures of the International Nucleotide Sequence Database Collaboration (INSDC), GSA adopts four data objects (BioProject, BioSample, Experiment, and Run) for data organization, accepts raw sequence reads produced by a variety of sequencing platforms, stores both sequence reads and metadata submitted from all over the world, and makes all these data publicly available to worldwide scientific communities. In the era of big data, GSA is not only an important complement to existing INSDC members by alleviating the increasing burdens of handling sequence data deluge, but also takes the significant responsibility for global big data archive and provides free unrestricted access to all publicly available data in support of research activities throughout the world.
文摘Long-PCR amplification, clone and primer-walking sequencing methods were employed in determine the complete sequence of mitochondrial genome of tokay (Gekko gecko). The genome is 16 435 bp in size, contains 13 protein-coding, 2 ribosomal and 22 transfer RNA genes. The mt genome of Gekko is similar to most of the vertebrates in gene components, order, orientation, tRNA structures, low percentage of guanine and high percentage of thymine, and skews of base GC and AT. Base A was preferred at third codon positions for protein genes is similar to amphibians and fishes rather than amnion vertebrates. The standard stop codes (TAA) present only in three protein genes, less than those of most vertebrates. Transfer RNA genes range in length from 63 to 76 nt, their planar structure present characteristic clover leaf, except for tRNA-Cys and tRNA-Ser (AGY) because of lacking the D arm.
基金supported by grants from the Nature Science Foundation of Shandong Province of China (grant nos.ZR2013CQ024 and ZR2015CM020)
文摘Pathogenic Escherichia coli cause chicken colibacillosis, which is economically devastating to the poultry in- dustry worldwide (Bagheri et al., 2014). Owing to in- creasing antibiotic resistance, phage therapy reagents have been developed to treat bacterial infections (Xu et al., 2015).
文摘Enterococci bacteria are important in environmental, food and clinical microbiology. Enterococcus faecium is a nosocomial pathogen that causes bacteremia, endocarditis and other infections. It is among the most prevalent organisms encountered in hospital-associated infections accounting for approximately 12% of nosocomial infections in the USA (Linden and Miller, 1999). However, certain strains of E. faecium are not only non-pathogenic but also have beneficial effects on human health with probiotic potential. For example, E. faecium T-110 is a consortium member in several probiotic products including BIO-THREE~ which is widely prescribed for human, animal and aqua-cultural use. This strain was originally developed by TOA Pharmaceuticals in Japan, and later used in the probiotic products of several other companies.
基金This work was supported in part by grants from the National Basic Research Program of China(2012CB721102,2013CB734000)the Natural Science Foundation for Youth(31300034)+1 种基金the National Natural Science Foundation of China(31270056,31430004 and 31421061)LX.Z.is an Awardee of the National Distinguished Young Scholar Program in China(31125002).
文摘The complete genome of methanol-utilizing Amycolatopsis methanolica strain 239T was generated,revealing a single 7,237,391 nucleotide circular chromosome with 7074 annotated protein-coding sequences(CDSs).Comparative analyses against the complete genome sequences of Amycolatopsis japonica strain MG417-CF17T,Amycolatopsis mediterranei strain U32 and Amycolatopsis orientalis strain HCCB10007 revealed a broad spectrum of genomic structures,including various genome sizes,core/quasi-core/non-core configurations and different kinds of episomes.Although polyketide synthase gene clusters were absent from the A.methanolica genome,12 gene clusters related to the biosynthesis of other specialized(secondary)metabolites were identified.Complete pathways attributable to the facultative methylotrophic physiology of A.methanolica strain 239T,including both the mdo/mscR encoded methanol oxidation and the hps/hpi encoded formaldehyde assimilation via the ribulose monophosphate cycle,were identified together with evidence that the latter might be the result of horizontal gene transfer.Phylogenetic analyses based on 16S rDNA or orthologues of AMETH_3452,a novel actinobacterial class-specific conserved gene against 62 or 18 Amycolatopsis type strains,respectively,revealed three major phyletic lineages,namely the mesophilic or moderately thermophilic A.orientalis subclade(AOS),the mesophilic Amycolatopsis taiwanensis subclade(ATS)and the thermophilic A.methanolica subclade(AMS).The distinct growth temperatures of members of the subclades correlated with corresponding genetic variations in their encoded compatible solutes.This study shows the value of integrating conventional taxonomic with whole genome sequence data.
文摘Two decades have passed since the first bacterial whole-genome sequencing, which provides new opportunity for microbial genome. Consequently, considerable genetic diversity encoded by bacterial genomes and among the strains in the same species has been revealed. In recent years, genome sequencing techniques and bioinformatics have developed rapidly, which has resulted in transformation and expedited the application of strategy and methodology for bacterial genome comparison used in dissection of infectious disease epidemics. Bacterial whole-genome sequencing and bioinformatic computing allow genotyping to satisfy the requirements of epidemiological study in disease control. In this review, we outline the significance and summarize the roles of bacterial genome sequencing in the context of bacterial disease control and prevention. We discuss the applications of bacterial genome sequencing in outbreak detection, source tracing, transmission mode discovery, and new epidemic clone identification. Wide applications of genome sequencing and data sharing in infectious disease surveillance networks will considerably promote outbreak detection and early warning to prevent the dissemination of bacterial diseases.
基金supported by the National key research and development plan(2016TFC1202700,2016YFC1200900)Beijing Municipal Science&Technology Commission project(grant numbers D151100002115003)Guangzhou Municipal Science&Technology Commission project(grant numbers 2015B2150820)
文摘Objective Knowledge of an enterovirus genome sequence is very important in epidemiological investigation to identify transmission patterns and ascertain the extent of an outbreak. The MinION sequencer is increasingly used to sequence various viral pathogens in many clinical situations because of its long reads, portability, real-time accessibility of sequenced data, and very low initial costs. However, information is lacking on MinION sequencing of enterovirus genomes. Methods In this proof-of-concept study using Enterovirus 71 (EV71) and Coxsackievirus A16 (CA16) strains as examples, we established an amplicon-based whole genome sequencing method using MinION. We explored the accuracy, minimum sequencing time, discrimination and high-throughput sequencing ability of MinION, and compared its performance with Sanger sequencing. Results Within the first minute (min) of sequencing, the accuracy of MinION was 98.5% for the single EV71 strain and 94.12%-97.33% for 10 genetically-related CA16 strains. In as little as 14 min, 99% identity was reached for the single EV71 strain, and in 17 min (on average), 99% identity was achieved for 10 CA16 strains in a single run. Conclusion MinION is suitable for whole genome sequencing of enteroviruses with sufficient accuracy and fine discrimination and has the potential as a fast, reliable and convenient method for routine use.
基金supported by Doctor research startup foundation of Changzhi Medical College(BS201912,BS201921)Key Project of Qinghai Health and Family Planning Commission(2017-wjzd-08)and Qinghai Thousand People Plan.
文摘Influenza A(H3N2)virus has a faster evolution rate than other types of influenza viruses.In this study,whole genome sequencing was performed to better understand themolecular evolution of influenzaH3N2 and the protective effect of influenza virus vaccine in Qinghai Province,China,in 2017.Complete sequences of eight gene segments of two seasonal influenza H3N2 isolates were sequenced and analyzed using DNASTAR and MEGA 6.06 software.Additionally,the three-dimensional structure of the HA protein was predicted using the SWISS-MODEL server.Phylogenetic and amino acid sequence analysis revealed that two Qinghai H3N2 isolates were typical low-pathogenic influenza viruses,and were relatively closely related to the 2016–2017 vaccine strain,3C.2a-A/Hong Kong/4801/2014.The presence of several antigenic site substitutions(T131K,G/R142K,K160T and R261Q in the HA protein)were specific for the two Qinghai H3N2 virus strains.In addition,amino acid substitution of K160T at the glycosylation site of HA and H75P in PB1-F2 in Qinghai isolatesmight affect the antibody binding ability and virulence of the influenza virus.The presence of several antigenic site mutations in the Qinghai H3N2 isolates confirmed the evolution of circulating H3N2 strains.
文摘Coding sequences (CDS) are commonly used for transient gene expression, in yeast two-hybrid screening, to verify protein interactions and in prokaryotic gene expression studies. CDS are most commonly obtained using complementary DNA (cDNA) derived from messenger RNA (mRNA) extracted from plant tissues and generated by reverse transcription. However, some CDS are difficult to acquire through this process as they are expressed at extremely low levels or have specific spatial and/or temporal expression patterns in vivo. These challenges require the development of alternative CDS cloning technologies. In this study, we found that the genomic intron-containing gene coding sequences (gDNA) from Arabidopsis thaliana, Oryza sativa, Brassica napus, and Glycine max can be correctly transcribed and spliced into mRNA in Nicotiana benthamiana. In contrast, gDNAs from Triticum aestivum and Sorghum bicolor did not function correctly. In transient expression experiments, the target DNA sequence is driven by a constitutive promoter. Theoretically, a sufficient amount of mRNA can be extracted from the N. benthamiana leaves, making it conducive to the cloning of CDS target genes. Our data demonstrate that N. benthamiana can be used as an effective host for the cloning CDS of plant genes.
文摘Introduction: Omicron is a highly divergent variant of concern (VOCs) of a severe acute respiratory syndrome SARS-CoV-2. It carries a high number of mutations in its spike protein hence;it is more transmissible in the community by immune evasion mechanisms. Due to mutation within S gene, most Omicron variants have reported S gene target failure (SGTF) with some commercially available PCR kits. Such diagnostic features can be used as markers to screen Omicron. However, Whole Genome Sequencing (WGS) is the only gold standard approach to confirm novel microorganisms at genetically level as similar mutations can also be found in other variants that are circulating at low frequencies worldwide. This Retrospective study is aimed to assess RT-PCR sensitivity in the detection of S gene target failure in comparison with whole genome sequencing to detect variants of Omicron. Methods: We have analysed retrospective data of SARS-CoV-2 positive RT-PCR samples for S gene target failure (SGTF) with TaqPath COVID-19 RT-PCR Combo Kit (ThermoFisher) and combined with sequencing technologies to study the emerged pattern of SARS-CoV-2 variants during third wave at the tertiary care centre, Surat. Results: From the first day of December 2021 till the end of February 2022, a total of 321,803 diagnostic RT-PCR tests for SARS-CoV-2 were performed, of which 20,566 positive cases were reported at our tertiary care centre with an average cumulative positivity of 6.39% over a period of three months. In the month of December 21 samples characterized by the SGTF (70/129) were suggestive of being infected by the Omicron variant and identified as Omicron (B.1.1.529 lineage) when sequence. In the month of January, we analysed a subset of samples (n = 618) with SGTF (24%) and without SGTF (76%) with Ct values Conclusions: During the COVID-19 pandemic, it took almost more than 15 days to diagnose infection and identify pathogen by sequencing technology. In contrast to that molecular assay provided quick identification with the help of SGTF phenomenon within 5 hours of duration. This strategy helps scientists and health policymakers for the quick isolation and identification of clusters. That ultimately results in a decreased transmission of pathogen among the community.