The complete nucleotide sequence of the mumps virus SP, which was isolated in China, was determined. As with other mumps viruses, its genome was 15 384 nucleotides (nts) in length and encoded seven proteins. The full-...The complete nucleotide sequence of the mumps virus SP, which was isolated in China, was determined. As with other mumps viruses, its genome was 15 384 nucleotides (nts) in length and encoded seven proteins. The full-length nucleotide sequence of the SP isolate differed from other strains by 4% –6.8% at the nucleotide sequence level. Due to variations of amino acids over the full genome (including the HN and N genes), this isolate exhibited significant variations in the antigenic sites. This report is the first to describe the full-length genome of a genotype F strain and provide an overview of the diversity of genetic characteristics of a circulating mumps virus.展开更多
The complete nucleotide sequence of the measles virus strain IMB-1,which was isolated in China,was determined.As in other measles viruses,its genome is 15,894 nucleotides in length and encodes six proteins.The full-le...The complete nucleotide sequence of the measles virus strain IMB-1,which was isolated in China,was determined.As in other measles viruses,its genome is 15,894 nucleotides in length and encodes six proteins.The full-length nucleotide sequence of the IMB-1 isolate differed from vaccine strains (including wild-type Edmonston strain) by 4%-5% at the nucleotide sequence level.This isolate has amino acid variations over the full genome,including in the hemagglutinin and fusion genes.This report is the first to describe the full-length genome of a genotype H1 strain and provide an overview of the diversity of genetic characteristics of a circulating measles virus.展开更多
The brown marmorated stink bug, <i><span style="font-family:Verdana;">Halyomorpha halys</span></i><span style="font-family:Verdana;"> (Stal) (Hemiptera: Pentatomidae) ...The brown marmorated stink bug, <i><span style="font-family:Verdana;">Halyomorpha halys</span></i><span style="font-family:Verdana;"> (Stal) (Hemiptera: Pentatomidae) is an invasive species native to East Asia that has spread across Asia, Europe</span><span style="font-family:Verdana;">,</span><span style="font-family:;" "=""><span style="font-family:Verdana;"> and North America. </span><i><span style="font-family:Verdana;">H. halys </span></i><span style="font-family:Verdana;">causes damages to various grains, fruits, and vegetables, which is exemplified by the significant damage to the hazelnut harvest in Georgia (during 2016). This report describes the first attempted genetic study of the spread of </span><i><span style="font-family:Verdana;">H. halys</span></i><span style="font-family:Verdana;"> in Georgia. The first main goal of this research was to identify the haplotype of an invasive population in Georgia. For this purpose, the mitochondrial cytochrome c oxidase I subunit (</span><i><span style="font-family:Verdana;">COI</span></i><span style="font-family:Verdana;">) gene fragment from 65 samples</span><i><span style="font-family:Verdana;"> of H. halys</span></i><span style="font-family:Verdana;"> collected from different regions across Georgia was sequenced on an Applied Biosystems 3100 or 3700 genetic analyzer. In all cases, only the H1 haplotype, which is native to China, was identified. The second goal of this research was to determine the complete mitochondrial DNA sequence of </span><i><span style="font-family:Verdana;">H. halys</span></i><span style="font-family:Verdana;"> (Stal) specimens collected </span><span style="font-family:Verdana;">across Georgia. The complete mitochondrial DNA of H1 haplotype s</span><span style="font-family:Verdana;">equenced on an Illumina MiSeq platform. The mitochondrial DNA of the Georgian H1 haplotype has a length of 15</span></span><span style="font-family:Verdana;">,</span><span style="font-family:;" "=""><span style="font-family:Verdana;">478 base pairs. Using the sequence of the H22 haplotype of </span><i><span style="font-family:Verdana;">H. halys </span></i><span style="font-family:Verdana;">(native to Korea) as a reference, 62 single nucleotide polymorphisms (SNPs), three inversions</span></span><span style="font-family:Verdana;">,</span><span style="font-family:;" "=""><span style="font-family:Verdana;"> and four single T insertions were identified. Furthermore, 60 SNPs and four insertions in two tRNA and one rRNA genes were identified among 18 mitochondrial genes from the Georgian H1 haplotype. Nine of these SNPs resulted in amino acid substitutions. Furthermore, the detection of SNPs revealed many other polymorphic sites beyond the </span><i><span style="font-family:Verdana;">COI</span></i><span style="font-family:Verdana;"> gene, which can be used to detect new haplotypes.</span></span>展开更多
Alternative splicing is a major contributor to genomic complexity and proteome diversity, yet the analysis of alternative splicing for the sequence containing nucleotide binding site and leucine-rich repeats (NBS-LRR...Alternative splicing is a major contributor to genomic complexity and proteome diversity, yet the analysis of alternative splicing for the sequence containing nucleotide binding site and leucine-rich repeats (NBS-LRR) domain has not been explored in rice (Oryza sativa L.). Hidden Markov model (HMM) searches were performed for NBS-LRR domain. 875 NBS-LRR-encoding sequences were obtained from the Institute for Genomic Research (TIGR). All of them were used to blast Knowledge-based Oryza Molecular Biological Encyclopaedia (KOME), TIGR rice gene index (TGI), and Universal Protein Resource (UniProt) to obtain homologous full-length cDNAs (FL-cDNAs), tentative consensus sequences, and protein sequences. Alternative splicing events were detected from genomic alignment of FL-cDNAs, tentative consensus sequences, and protein sequences, which provide valuable information on splice variants of genes. These sequences were aligned to the corresponding BAC sequences using the Spidey and Sim4 programs and each of the proteins was aligned by tBLASTn. Of the 875 NBS-LRR sequences, 119 (13.6%) sequences had alternative splicing where multiple FL-cDNAs, TGI sequences and proteins corresponded to the same gene. 71 intron retention events, 20 exon skipping events, 16 alternative termination events, 25 alternative initiation events, 12 alternative 5' splicing events, and 16 alternative 3' splicing events were identified. Most of these alternative splices were supported by two or more transcripts. The data sets are available at http://www.bioinfor.org. Furthermore, the bioinformatics analysis of splice boundaries showed that exon skipping and intron retention did not exhibit strong consensus. This implies a different regulation mechanism that guides the expression of splice isoforms. This article also presents the analysis of the effects of intron retention on proteins. The C-terminal regions of alternative proteins turned out to be more variable than the N-terminal regions. Finally, tissue distribution and protein localization of alternative splicing were explored. The largest categories of tissue distributions for alternative splicing were shoot and callus. More than one-thirds of protein localization for splice forms was plasma membrane and cytoplasm. All the NBS-LRR proteins for splice forms may have important function in disease resistance and activate downstream signaling pathways.展开更多
The energy of interaction between complementary nucleotides in promoter sequences of E. coli was calculated and visualized. The graphic method for presentation of energy properties of promoter sequences was elaborated...The energy of interaction between complementary nucleotides in promoter sequences of E. coli was calculated and visualized. The graphic method for presentation of energy properties of promoter sequences was elaborated on. Data obtained indicated that energy distribution through the length of promoter sequence results in picture with minima at –35, –8 and +7 regions corresponding to areas with elevated AT (adenine-thymine) content. The most important difference from the random sequences area is related to –8. Four promoter groups and their energy properties were revealed. The promoters with minimal and maximal energy of interaction between complementary nucleotides have low strengths, the strongest promoters correspond to promoter clusters characterized by intermediate energy values.展开更多
Here we report the adaptation and optimization of an effi cient, accurate and inexpensive assay that employs custom-designed silicon-based optical thin-fi lm biosensor chips to detect unique transgenes in genetically ...Here we report the adaptation and optimization of an effi cient, accurate and inexpensive assay that employs custom-designed silicon-based optical thin-fi lm biosensor chips to detect unique transgenes in genetically modi-展开更多
Using continuous wavelet transform as the analytical tool, the fractal characteristic of nucleotide sequences was studied. The fractal dimension of the exon and intron sequences for different species was calculated. ...Using continuous wavelet transform as the analytical tool, the fractal characteristic of nucleotide sequences was studied. The fractal dimension of the exon and intron sequences for different species was calculated. We use the Mexican hat wavelet function as the mother wavelet and Hurst exponent to describe the long-range correlation. It is found that the Hurst exponent of intron sequence is larger than that of exon sequence for the same gene.展开更多
A new version of DNA walks, where nucleotides are regarded unequal in their contribution to a walk is introduced, which allows us to study thoroughly the “fine structure” of nucleotide sequences. The approach is bas...A new version of DNA walks, where nucleotides are regarded unequal in their contribution to a walk is introduced, which allows us to study thoroughly the “fine structure” of nucleotide sequences. The approach is based on the assumption that nucleotides have an inner abstract characteristic, the determinative degree, which reflects genetic code phenomenological prop-erties and is adjusted to nucleotides physical properties. We consider each codon position independently, which gives three separate walks characterized by different angles and lengths, and that such an object is called triander which reflects the “strength” of branch. A general method for identifying DNA sequence “by triander” which can be treated as a unique “genogram” (or “gene passport”) is proposed. The two- and three-dimensional trianders are considered. The difference of sequences fine structure in genes and the intergenic space is shown. A clear triplet signal in coding sequences was found which is absent in the intergenic space and is independent from the sequence length. This paper presents the topological classification of trianders which can allow us to provide a detailed working out signatures of functionally different genomic regions.展开更多
In this paper we study the scaling behavior of nucleotide cluster in 11 chromosomes of Encephalitozoon cuniculi Genome. The statistical distribution of nucleotide clusters for 11 chromosomes is characterized by the sc...In this paper we study the scaling behavior of nucleotide cluster in 11 chromosomes of Encephalitozoon cuniculi Genome. The statistical distribution of nucleotide clusters for 11 chromosomes is characterized by the scaling behavior of P ( S ) ∝ e ?αS, where S represents nucleotide cluster size. The cluster-size distribution P(S1+S2) with the total size of sequential C-G cluster and A-T cluster S1+S2 were also studied. P(S1+S2) follows exponential decay. There does not exist the case of large C-G cluster following large A-T cluster or large A-T cluster following large C-G cluster. We also discuss the relatively random walk length function L(n) and the local compositional complexity of nucleotide sequences based on a new model. These investigations may provide some insight into nucleotide cluster of DNA sequence.展开更多
Using the complete genome of Plasmodium falciparum 3D7 which has 14 chromosomes as an example, we have examined the distribution functions for the amount of C or G and A or T consecutively and non-overlapping blocks o...Using the complete genome of Plasmodium falciparum 3D7 which has 14 chromosomes as an example, we have examined the distribution functions for the amount of C or G and A or T consecutively and non-overlapping blocks of m bases in this system. The function P(S) about the number of the consecutive C-G or A-T content cluster conforms to the relation P(S)∝e? ; αs values of the scaling exponent αCG are much larger than αAT; and αAT of 14 chromosomes are hardly changed, whereas αCG of 14 chromosomes have a number of fluctuations. We found maximum value of A-T cluster size is much larger than C-G, which implies the existence of large A-T cluster. Our study of the width function ξ(m) of cluster C-G content showed that follows good power law ξ(m)∝m?γ. The average γ for 14 chromosomes is 0.931. These investigations provide some insight into the nucleotide clusters of DNA sequences, and help us understand other properties of DNA sequences.展开更多
Pain perception is influenced by multiple factors. The single nucleotide polymorphisms(SNPs) of some genes were found associated with pain perception. This study aimed to examine the association of the genotypes of ...Pain perception is influenced by multiple factors. The single nucleotide polymorphisms(SNPs) of some genes were found associated with pain perception. This study aimed to examine the association of the genotypes of ABCB1 C3435 T,OPRM1 A118 G and COMT V108/158M(valine 108/158 methionine) with pain perception in cancer patients. We genotyped 146 cancer pain patients and 139 cancer patients without pain for ABCB1 C3435T(rs1045642),OPRM1 A118G(rs1799971) and COMT V108/158M(rs4680) by the fluorescent dye-terminator cycle sequencing method,and compared the genotype distribution between groups with different pain intensities by chi-square test and pain scores between groups with different genotypes by non-parametric test. The results showed that in these cancer patients,the frequency of variant T allele of ABCB1 C3435 T was 40.5%; that of G allele of OPRM1 A118 G was 38.5% and that of A allele of COMT V108/158 M was 23.3%. No significant difference in the genotype distribution of ABCB1 C3435T(rs1045642) and OPRM1 A118G(rs1799971) was observed between cancer pain group and control group(P=0.364 and 0.578); however,significant difference occurred in the genotype distribution of COMT V108/158M(rs4680) between the two groups(P=0.001). And the difference could not be explained by any other confounding factors. Moreover,we found that the genotypes of COMT V108/158 M and ABCB1 C3435 T were associated with the intensities of pain in cancer patients. In conclusion,our results indicate that the SNPs of COMT V108/158 M and ABCB1 C3435 T significantly influence the pain perception in Chinese cancer patients.展开更多
During the last decade,the generation and accumulation of petabase-scale high-throughput sequencing data have resulted in great challenges,including access to human data,as well as transfer,storage,and sharing of enor...During the last decade,the generation and accumulation of petabase-scale high-throughput sequencing data have resulted in great challenges,including access to human data,as well as transfer,storage,and sharing of enormous amounts of data.To promote data-driven biological research,the Korean government announced that all biological data generated from government-funded research projects should be deposited at the Korea BioData Station(K-BDS),which consists of multiple databases for individual data types.Here,we introduce the Korean Nucleotide Archive(KoNA),a repository of nucleotide sequence data.As of July 2022,the Korean Read Archive in KoNA has collected over 477 TB of raw next-generation sequencing data from national genome projects.To ensure data quality and prepare for international alignment,a standard operating procedure was adopted,which is similar to that of the International Nucleotide Sequence Database Collaboration.The standard operating procedure includes quality control processes for submitted data and metadata using an automated pipeline,followed by manual examination.To ensure fast and stable data transfer,a high-speed transmission system called GBox is used in KoNA.Furthermore,the data uploaded to or downloaded from KoNA through GBox can be readily processed using a cloud computing service called Bio-Express.This seamless coupling of KoNA,GBox,and Bio-Express enhances the data experience,including submission,access,and analysis of raw nucleotide sequences.KoNA not only satisfies the unmet needs for a national sequence repository in Korea but also provides datasets to researchers globally and contributes to advances in genomics.The KoNA is available at https://www.kobic.re.kr/kona/.展开更多
Background: Androgen insensitivity syndrome(AIS), a disorder of sexual development in 46, XY individuals, is caused by loss-of-function mutations in the androgen receptor(AR) gene. A variety of tumors have been report...Background: Androgen insensitivity syndrome(AIS), a disorder of sexual development in 46, XY individuals, is caused by loss-of-function mutations in the androgen receptor(AR) gene. A variety of tumors have been reported in association with AIS, but no cases with colorectal cancer(CRC) have been described.Case presentation: Here, we present a male patient with AIS who developed multiple early-onset CRCs and his pedigree. His first cousin was diagnosed with AIS and harbored the same AR gene mutation, but with no signs of CRC. The difference in clinical management for the two patients was that testosterone treatment was given to the proband for a much longer time compared with the cousin. The CRC family history was negative, and no germline mutations in well-known CRC-related genes were identified. A single nucleotide polymorphism array revealed a microduplication on chromosome 22q11.22 that encompassed a micro RNA potentially related to CRC pathogenesis. In the proband, whole exome sequencing identified a polymorphism in an oncogene and 13 rare loss-of-function variants, of which two were in CRC-related genes and four were in genes associated with other human cancers.Conclusions: By pathway analysis, all inherited germline genetic events were connected in a unique network whose alteration in the proband, together with continuous testosterone stimulation, may have played a role in CRC pathogenesis.展开更多
By using assembled expressed sequence tags (ESTs) from 14 different eDNA libraries that contain 84 132 sequences reads, 556 Populus candidate single nucleotide polymorphisms (SNPs) were identified. Because traces ...By using assembled expressed sequence tags (ESTs) from 14 different eDNA libraries that contain 84 132 sequences reads, 556 Populus candidate single nucleotide polymorphisms (SNPs) were identified. Because traces were not available from dbEST (http://www.ncbi.nlm.nih.gov/dbEST/index.html), stringent filters were used to identify reliable candidate SNPs. Sequences analysis indicated that the main types of substitutions among candidate SNPs were A/G and T/C transitions, which accounted for 22.0% and 30.8%, respectively. One hundred and ten candidate SNPs were tested. As a result, 38 candidate SNPs were confirmed by directed sequencing of PCR products amplified from six different individuals. Thirteen new SNPs in intron regions were found and multiple SNPs were found to be located in both intron and exon regions of four contigs. Heterozygosis was found in all 47 candidate sites and five SNP sites were heterozygous in all six samples. This is the first report of SNP identification in a tree species which reveals that assembled ESTs from multiple libraries of the public database may provide a rich source of comparative sequences for an SNP search in the poplar genome.展开更多
Mungbean (Vigna radiata (L.) Wilczek) is a unique species in its ability to fix atmospheric nitrogen, with early maturity, and relatively good drought resistance. We used 454 sequencing technology for transcriptom...Mungbean (Vigna radiata (L.) Wilczek) is a unique species in its ability to fix atmospheric nitrogen, with early maturity, and relatively good drought resistance. We used 454 sequencing technology for transcriptome sequencing. A total of 150 159 and 142 993 reads produced 5 254 and 6 374 large contigs (〉_500 bp) with an average length of 833 and 853 for Sunhwa and Jangan, respectively. Functional annotation to known sequences yielded 41.34% and 41.74% unigenes for Jangan and Sunhwa. A higher number of simple sequence repeat (SSR) motifs was identified in Jangan (1 630) compared with that of Sunhwa (1 334). A similar SSR distribution pattern was observed in both varieties. A total of 8 249 single nucleotide polymorphisms (SNPs) and indels with 2 098 high-confidence candidates were identified in the two mungbean varieties. The average distance between individual SNPs was -860 bp. Our report demonstrates the utility of transcriptomic data for implementing a functional annotation and development of genetic markers. We also provide large resource sequence data for mungbean improvement programs.展开更多
We have cloned the replicative form of the Periplaneta fuliginosa densonucleosis virus (PfDNV) genome and determined its complete sequence. The sequence has 5 454 nucleotides (nt), the genome consists of an internal u...We have cloned the replicative form of the Periplaneta fuliginosa densonucleosis virus (PfDNV) genome and determined its complete sequence. The sequence has 5 454 nucleotides (nt), the genome consists of an internal unique sequence flanked by inverted terminal repeats (201 nt). The first 122 nt at the 5’end and the terminal 122 nt at the 3’end of both plus and minus strands can fold into a typical hairpin structure. The genome contains seven major open reading frames (ORFs). The plus strand has 4 ORFs occupying the 5’half of the plus strand, whereas the others span the 5’ half of the minus strand. Two potential promoters were found at map units (m.u.) 3 and 97. Computer analysis of sequence homologies with other parvoviruses suggests that the plus strand of Pf DNV encodes very likely the nonstructural proteins and the minus strand probably encodes the structural proteins.展开更多
Wheat yellow mosaic virus (WYMV) isolate HC was used for viral cDNA synthesis and sequencing. The results show that the viral RNA1 is 7 629 nucleotides encoding a polyprotein with 2 407 amino acids, from which seven p...Wheat yellow mosaic virus (WYMV) isolate HC was used for viral cDNA synthesis and sequencing. The results show that the viral RNA1 is 7 629 nucleotides encoding a polyprotein with 2 407 amino acids, from which seven putative proteins may be produced by an autolytic cleavage processing besides the viral coat protein. The RNA2 is 3 639 nucleotides and codes for a polyprotein of 903 amino acids, which may contain two putative non-structural proteins. Although WYMV shares a similarity in genetic organization to wheat spindle streak mosaic virus (WSSMV), the identities in their nucleotide sequences or deduced amino acid sequences are as low as 70% and 75% respectively. Based on this result, it is confirmed that WYMV and WSSMV are different species within Bymovirus .展开更多
Based on reported TMV-U1 sequence, primers were designed and fragments covering the entire genome of TMV broad bean strain (TMV-B) were obtained with RT-PCR. These fragments were cloned and sequenced and the 5’ and 3...Based on reported TMV-U1 sequence, primers were designed and fragments covering the entire genome of TMV broad bean strain (TMV-B) were obtained with RT-PCR. These fragments were cloned and sequenced and the 5’ and 3’ end sequences of genome were confirmed with RACE. The complete sequence of TMV-B comprises 6 395 nucleotides (nt) and four open reading frames, which correspond to 126 ku (1 116 amino acids), 183 ku (1 616 amino acids), 30 ku (268 amino acids) and 17.5 ku proteins (159 amino acids). The complete nucleotide sequence of TMV-B is 99.4% identical to that of TMV-U1. The two virus isolates share the same sequence of 5’, 3’ non-coding region and 17.5 K ORF, and 6, 1 and 3 amino acid changes are found in 126 K protein, 54 K protein and 30 K protein, respectively. The possible mechanism on the infection of TMV-B in Vicia faba is discussed.展开更多
Four kinds of mitochondrial plasmid-like DNAs, designated pC1, pC2, pC3 and pC4, were detected in Cucumis sativus Jinyan No. 4. The electron microscopy observation showed that pC4 was linear conformation. Complete seq...Four kinds of mitochondrial plasmid-like DNAs, designated pC1, pC2, pC3 and pC4, were detected in Cucumis sativus Jinyan No. 4. The electron microscopy observation showed that pC4 was linear conformation. Complete sequence of pC4 was cloned into pUC19 with E. coli JM109 as host. Sequence analysis revealed that pC4 was 370 bp long, the shortest one among all the reported mitochondrial plasmid-like DNAs. pC4 was AT rich. It contained terminal direct repeat sequence (35 bp in length)as well as many short direct and inverted repeats. ORFs in pC4 were short. pC4 was found to be homologous to nuclear DNAs, but lack homology with main mitochondrial and chloroplast DNAs. pC4-homologous sequence also occurred in nuclear genome of Jinyan No. 7 which contained no mitochondrial plasmid-like DNAs. The hybridization pattern of Jinyan No. 7 was slightly different from that of Jinyan No. 4. This suggested that pC4 occurred at the forepart of Cucumis sativus species divergence and integrated into the nuclear genome,展开更多
基金Public Benefit Grant of Ministry of Health, China (200802035)Natural Science Foundation of Yunnan province (2008CD153)
文摘The complete nucleotide sequence of the mumps virus SP, which was isolated in China, was determined. As with other mumps viruses, its genome was 15 384 nucleotides (nts) in length and encoded seven proteins. The full-length nucleotide sequence of the SP isolate differed from other strains by 4% –6.8% at the nucleotide sequence level. Due to variations of amino acids over the full genome (including the HN and N genes), this isolate exhibited significant variations in the antigenic sites. This report is the first to describe the full-length genome of a genotype F strain and provide an overview of the diversity of genetic characteristics of a circulating mumps virus.
基金Public Benefit Grant of Ministry of Health P.R China (200802035)Basic Research Foundation(General Program) of Yunnan Province (2008CD153)
文摘The complete nucleotide sequence of the measles virus strain IMB-1,which was isolated in China,was determined.As in other measles viruses,its genome is 15,894 nucleotides in length and encodes six proteins.The full-length nucleotide sequence of the IMB-1 isolate differed from vaccine strains (including wild-type Edmonston strain) by 4%-5% at the nucleotide sequence level.This isolate has amino acid variations over the full genome,including in the hemagglutinin and fusion genes.This report is the first to describe the full-length genome of a genotype H1 strain and provide an overview of the diversity of genetic characteristics of a circulating measles virus.
文摘The brown marmorated stink bug, <i><span style="font-family:Verdana;">Halyomorpha halys</span></i><span style="font-family:Verdana;"> (Stal) (Hemiptera: Pentatomidae) is an invasive species native to East Asia that has spread across Asia, Europe</span><span style="font-family:Verdana;">,</span><span style="font-family:;" "=""><span style="font-family:Verdana;"> and North America. </span><i><span style="font-family:Verdana;">H. halys </span></i><span style="font-family:Verdana;">causes damages to various grains, fruits, and vegetables, which is exemplified by the significant damage to the hazelnut harvest in Georgia (during 2016). This report describes the first attempted genetic study of the spread of </span><i><span style="font-family:Verdana;">H. halys</span></i><span style="font-family:Verdana;"> in Georgia. The first main goal of this research was to identify the haplotype of an invasive population in Georgia. For this purpose, the mitochondrial cytochrome c oxidase I subunit (</span><i><span style="font-family:Verdana;">COI</span></i><span style="font-family:Verdana;">) gene fragment from 65 samples</span><i><span style="font-family:Verdana;"> of H. halys</span></i><span style="font-family:Verdana;"> collected from different regions across Georgia was sequenced on an Applied Biosystems 3100 or 3700 genetic analyzer. In all cases, only the H1 haplotype, which is native to China, was identified. The second goal of this research was to determine the complete mitochondrial DNA sequence of </span><i><span style="font-family:Verdana;">H. halys</span></i><span style="font-family:Verdana;"> (Stal) specimens collected </span><span style="font-family:Verdana;">across Georgia. The complete mitochondrial DNA of H1 haplotype s</span><span style="font-family:Verdana;">equenced on an Illumina MiSeq platform. The mitochondrial DNA of the Georgian H1 haplotype has a length of 15</span></span><span style="font-family:Verdana;">,</span><span style="font-family:;" "=""><span style="font-family:Verdana;">478 base pairs. Using the sequence of the H22 haplotype of </span><i><span style="font-family:Verdana;">H. halys </span></i><span style="font-family:Verdana;">(native to Korea) as a reference, 62 single nucleotide polymorphisms (SNPs), three inversions</span></span><span style="font-family:Verdana;">,</span><span style="font-family:;" "=""><span style="font-family:Verdana;"> and four single T insertions were identified. Furthermore, 60 SNPs and four insertions in two tRNA and one rRNA genes were identified among 18 mitochondrial genes from the Georgian H1 haplotype. Nine of these SNPs resulted in amino acid substitutions. Furthermore, the detection of SNPs revealed many other polymorphic sites beyond the </span><i><span style="font-family:Verdana;">COI</span></i><span style="font-family:Verdana;"> gene, which can be used to detect new haplotypes.</span></span>
基金This work was supported by Natural Sciences Foundation of Guangdong Province (No. 0409078)Natural Sciences Foundation from the Education Department of Guangdong Province (No. z02051).
文摘Alternative splicing is a major contributor to genomic complexity and proteome diversity, yet the analysis of alternative splicing for the sequence containing nucleotide binding site and leucine-rich repeats (NBS-LRR) domain has not been explored in rice (Oryza sativa L.). Hidden Markov model (HMM) searches were performed for NBS-LRR domain. 875 NBS-LRR-encoding sequences were obtained from the Institute for Genomic Research (TIGR). All of them were used to blast Knowledge-based Oryza Molecular Biological Encyclopaedia (KOME), TIGR rice gene index (TGI), and Universal Protein Resource (UniProt) to obtain homologous full-length cDNAs (FL-cDNAs), tentative consensus sequences, and protein sequences. Alternative splicing events were detected from genomic alignment of FL-cDNAs, tentative consensus sequences, and protein sequences, which provide valuable information on splice variants of genes. These sequences were aligned to the corresponding BAC sequences using the Spidey and Sim4 programs and each of the proteins was aligned by tBLASTn. Of the 875 NBS-LRR sequences, 119 (13.6%) sequences had alternative splicing where multiple FL-cDNAs, TGI sequences and proteins corresponded to the same gene. 71 intron retention events, 20 exon skipping events, 16 alternative termination events, 25 alternative initiation events, 12 alternative 5' splicing events, and 16 alternative 3' splicing events were identified. Most of these alternative splices were supported by two or more transcripts. The data sets are available at http://www.bioinfor.org. Furthermore, the bioinformatics analysis of splice boundaries showed that exon skipping and intron retention did not exhibit strong consensus. This implies a different regulation mechanism that guides the expression of splice isoforms. This article also presents the analysis of the effects of intron retention on proteins. The C-terminal regions of alternative proteins turned out to be more variable than the N-terminal regions. Finally, tissue distribution and protein localization of alternative splicing were explored. The largest categories of tissue distributions for alternative splicing were shoot and callus. More than one-thirds of protein localization for splice forms was plasma membrane and cytoplasm. All the NBS-LRR proteins for splice forms may have important function in disease resistance and activate downstream signaling pathways.
文摘The energy of interaction between complementary nucleotides in promoter sequences of E. coli was calculated and visualized. The graphic method for presentation of energy properties of promoter sequences was elaborated on. Data obtained indicated that energy distribution through the length of promoter sequence results in picture with minima at –35, –8 and +7 regions corresponding to areas with elevated AT (adenine-thymine) content. The most important difference from the random sequences area is related to –8. Four promoter groups and their energy properties were revealed. The promoters with minimal and maximal energy of interaction between complementary nucleotides have low strengths, the strongest promoters correspond to promoter clusters characterized by intermediate energy values.
文摘Here we report the adaptation and optimization of an effi cient, accurate and inexpensive assay that employs custom-designed silicon-based optical thin-fi lm biosensor chips to detect unique transgenes in genetically modi-
基金This work was supported by the Provincial Natural Science Foundation of Guangdong(Contract 990944)the National Natural Science Foundation of China(Contract 20205003,29975033).
文摘Using continuous wavelet transform as the analytical tool, the fractal characteristic of nucleotide sequences was studied. The fractal dimension of the exon and intron sequences for different species was calculated. We use the Mexican hat wavelet function as the mother wavelet and Hurst exponent to describe the long-range correlation. It is found that the Hurst exponent of intron sequence is larger than that of exon sequence for the same gene.
文摘A new version of DNA walks, where nucleotides are regarded unequal in their contribution to a walk is introduced, which allows us to study thoroughly the “fine structure” of nucleotide sequences. The approach is based on the assumption that nucleotides have an inner abstract characteristic, the determinative degree, which reflects genetic code phenomenological prop-erties and is adjusted to nucleotides physical properties. We consider each codon position independently, which gives three separate walks characterized by different angles and lengths, and that such an object is called triander which reflects the “strength” of branch. A general method for identifying DNA sequence “by triander” which can be treated as a unique “genogram” (or “gene passport”) is proposed. The two- and three-dimensional trianders are considered. The difference of sequences fine structure in genes and the intergenic space is shown. A clear triplet signal in coding sequences was found which is absent in the intergenic space and is independent from the sequence length. This paper presents the topological classification of trianders which can allow us to provide a detailed working out signatures of functionally different genomic regions.
基金Project supported by the National Natural Science Foundation of China (No. 20574052)Program for New Century Excellent Talents in University,and the Natural Science Foundation of Zhejiang Prov-ince (Nos. R404047 and Y405011),China
文摘In this paper we study the scaling behavior of nucleotide cluster in 11 chromosomes of Encephalitozoon cuniculi Genome. The statistical distribution of nucleotide clusters for 11 chromosomes is characterized by the scaling behavior of P ( S ) ∝ e ?αS, where S represents nucleotide cluster size. The cluster-size distribution P(S1+S2) with the total size of sequential C-G cluster and A-T cluster S1+S2 were also studied. P(S1+S2) follows exponential decay. There does not exist the case of large C-G cluster following large A-T cluster or large A-T cluster following large C-G cluster. We also discuss the relatively random walk length function L(n) and the local compositional complexity of nucleotide sequences based on a new model. These investigations may provide some insight into nucleotide cluster of DNA sequence.
基金Project supported by the National Natural Science Foundation ofChina (Nos. 20174036 20274040)+2 种基金 and the Natural Science Founda-tion of Zhejiang Province (Nos. R404047 10102) China
文摘Using the complete genome of Plasmodium falciparum 3D7 which has 14 chromosomes as an example, we have examined the distribution functions for the amount of C or G and A or T consecutively and non-overlapping blocks of m bases in this system. The function P(S) about the number of the consecutive C-G or A-T content cluster conforms to the relation P(S)∝e? ; αs values of the scaling exponent αCG are much larger than αAT; and αAT of 14 chromosomes are hardly changed, whereas αCG of 14 chromosomes have a number of fluctuations. We found maximum value of A-T cluster size is much larger than C-G, which implies the existence of large A-T cluster. Our study of the width function ξ(m) of cluster C-G content showed that follows good power law ξ(m)∝m?γ. The average γ for 14 chromosomes is 0.931. These investigations provide some insight into the nucleotide clusters of DNA sequences, and help us understand other properties of DNA sequences.
基金supported by the National Natural Science Foundation of China(No.813019)National key Scientific Instrument Special Program of China(No.2013 YQ 030923)+1 种基金the Natural Science Foundation of Hubei Province(No.2013 CFB138)Scientific Research Project of Health and Family Planning of Hubei Province(No.WJ2015Q009,JX5B37)
文摘Pain perception is influenced by multiple factors. The single nucleotide polymorphisms(SNPs) of some genes were found associated with pain perception. This study aimed to examine the association of the genotypes of ABCB1 C3435 T,OPRM1 A118 G and COMT V108/158M(valine 108/158 methionine) with pain perception in cancer patients. We genotyped 146 cancer pain patients and 139 cancer patients without pain for ABCB1 C3435T(rs1045642),OPRM1 A118G(rs1799971) and COMT V108/158M(rs4680) by the fluorescent dye-terminator cycle sequencing method,and compared the genotype distribution between groups with different pain intensities by chi-square test and pain scores between groups with different genotypes by non-parametric test. The results showed that in these cancer patients,the frequency of variant T allele of ABCB1 C3435 T was 40.5%; that of G allele of OPRM1 A118 G was 38.5% and that of A allele of COMT V108/158 M was 23.3%. No significant difference in the genotype distribution of ABCB1 C3435T(rs1045642) and OPRM1 A118G(rs1799971) was observed between cancer pain group and control group(P=0.364 and 0.578); however,significant difference occurred in the genotype distribution of COMT V108/158M(rs4680) between the two groups(P=0.001). And the difference could not be explained by any other confounding factors. Moreover,we found that the genotypes of COMT V108/158 M and ABCB1 C3435 T were associated with the intensities of pain in cancer patients. In conclusion,our results indicate that the SNPs of COMT V108/158 M and ABCB1 C3435 T significantly influence the pain perception in Chinese cancer patients.
基金supported by the Next-generation Genome-InfraNET for the advancement of genome research and service(Grant No.2019M3C9A5069653)the Construction of biological data station(Grant No.2020M3A9I6A01036057)grants from the National Research Foundation of Korea.
文摘During the last decade,the generation and accumulation of petabase-scale high-throughput sequencing data have resulted in great challenges,including access to human data,as well as transfer,storage,and sharing of enormous amounts of data.To promote data-driven biological research,the Korean government announced that all biological data generated from government-funded research projects should be deposited at the Korea BioData Station(K-BDS),which consists of multiple databases for individual data types.Here,we introduce the Korean Nucleotide Archive(KoNA),a repository of nucleotide sequence data.As of July 2022,the Korean Read Archive in KoNA has collected over 477 TB of raw next-generation sequencing data from national genome projects.To ensure data quality and prepare for international alignment,a standard operating procedure was adopted,which is similar to that of the International Nucleotide Sequence Database Collaboration.The standard operating procedure includes quality control processes for submitted data and metadata using an automated pipeline,followed by manual examination.To ensure fast and stable data transfer,a high-speed transmission system called GBox is used in KoNA.Furthermore,the data uploaded to or downloaded from KoNA through GBox can be readily processed using a cloud computing service called Bio-Express.This seamless coupling of KoNA,GBox,and Bio-Express enhances the data experience,including submission,access,and analysis of raw nucleotide sequences.KoNA not only satisfies the unmet needs for a national sequence repository in Korea but also provides datasets to researchers globally and contributes to advances in genomics.The KoNA is available at https://www.kobic.re.kr/kona/.
基金supported in part by funds obtained through an Italian law that allows taxpayers to allocate 0.5 percent share of their income tax contribution to a research institution of their choice
文摘Background: Androgen insensitivity syndrome(AIS), a disorder of sexual development in 46, XY individuals, is caused by loss-of-function mutations in the androgen receptor(AR) gene. A variety of tumors have been reported in association with AIS, but no cases with colorectal cancer(CRC) have been described.Case presentation: Here, we present a male patient with AIS who developed multiple early-onset CRCs and his pedigree. His first cousin was diagnosed with AIS and harbored the same AR gene mutation, but with no signs of CRC. The difference in clinical management for the two patients was that testosterone treatment was given to the proband for a much longer time compared with the cousin. The CRC family history was negative, and no germline mutations in well-known CRC-related genes were identified. A single nucleotide polymorphism array revealed a microduplication on chromosome 22q11.22 that encompassed a micro RNA potentially related to CRC pathogenesis. In the proband, whole exome sequencing identified a polymorphism in an oncogene and 13 rare loss-of-function variants, of which two were in CRC-related genes and four were in genes associated with other human cancers.Conclusions: By pathway analysis, all inherited germline genetic events were connected in a unique network whose alteration in the proband, together with continuous testosterone stimulation, may have played a role in CRC pathogenesis.
基金the National Natural Science Foundation of China,the Jiangsu Provincial Hi-Tech Research Project,the Jiangsu Provincial Graduate Innovation Project
文摘By using assembled expressed sequence tags (ESTs) from 14 different eDNA libraries that contain 84 132 sequences reads, 556 Populus candidate single nucleotide polymorphisms (SNPs) were identified. Because traces were not available from dbEST (http://www.ncbi.nlm.nih.gov/dbEST/index.html), stringent filters were used to identify reliable candidate SNPs. Sequences analysis indicated that the main types of substitutions among candidate SNPs were A/G and T/C transitions, which accounted for 22.0% and 30.8%, respectively. One hundred and ten candidate SNPs were tested. As a result, 38 candidate SNPs were confirmed by directed sequencing of PCR products amplified from six different individuals. Thirteen new SNPs in intron regions were found and multiple SNPs were found to be located in both intron and exon regions of four contigs. Heterozygosis was found in all 47 candidate sites and five SNP sites were heterozygous in all six samples. This is the first report of SNP identification in a tree species which reveals that assembled ESTs from multiple libraries of the public database may provide a rich source of comparative sequences for an SNP search in the poplar genome.
基金support of the "Cooperative Research Program for Agriculture Science & Technology Development (Project No. 200908FHT020609001)" Rural Development Administration (RDA),Republic of Korea
文摘Mungbean (Vigna radiata (L.) Wilczek) is a unique species in its ability to fix atmospheric nitrogen, with early maturity, and relatively good drought resistance. We used 454 sequencing technology for transcriptome sequencing. A total of 150 159 and 142 993 reads produced 5 254 and 6 374 large contigs (〉_500 bp) with an average length of 833 and 853 for Sunhwa and Jangan, respectively. Functional annotation to known sequences yielded 41.34% and 41.74% unigenes for Jangan and Sunhwa. A higher number of simple sequence repeat (SSR) motifs was identified in Jangan (1 630) compared with that of Sunhwa (1 334). A similar SSR distribution pattern was observed in both varieties. A total of 8 249 single nucleotide polymorphisms (SNPs) and indels with 2 098 high-confidence candidates were identified in the two mungbean varieties. The average distance between individual SNPs was -860 bp. Our report demonstrates the utility of transcriptomic data for implementing a functional annotation and development of genetic markers. We also provide large resource sequence data for mungbean improvement programs.
文摘We have cloned the replicative form of the Periplaneta fuliginosa densonucleosis virus (PfDNV) genome and determined its complete sequence. The sequence has 5 454 nucleotides (nt), the genome consists of an internal unique sequence flanked by inverted terminal repeats (201 nt). The first 122 nt at the 5’end and the terminal 122 nt at the 3’end of both plus and minus strands can fold into a typical hairpin structure. The genome contains seven major open reading frames (ORFs). The plus strand has 4 ORFs occupying the 5’half of the plus strand, whereas the others span the 5’ half of the minus strand. Two potential promoters were found at map units (m.u.) 3 and 97. Computer analysis of sequence homologies with other parvoviruses suggests that the plus strand of Pf DNV encodes very likely the nonstructural proteins and the minus strand probably encodes the structural proteins.
基金Project supported by the National Natural Science Foundation of China (Grant No. 39470481)High-Tech. Program of China. The nucleotide sequence data reported in this paper have been submitted to DDBJ/GenBank/EMBL database with the accession number AF0
文摘Wheat yellow mosaic virus (WYMV) isolate HC was used for viral cDNA synthesis and sequencing. The results show that the viral RNA1 is 7 629 nucleotides encoding a polyprotein with 2 407 amino acids, from which seven putative proteins may be produced by an autolytic cleavage processing besides the viral coat protein. The RNA2 is 3 639 nucleotides and codes for a polyprotein of 903 amino acids, which may contain two putative non-structural proteins. Although WYMV shares a similarity in genetic organization to wheat spindle streak mosaic virus (WSSMV), the identities in their nucleotide sequences or deduced amino acid sequences are as low as 70% and 75% respectively. Based on this result, it is confirmed that WYMV and WSSMV are different species within Bymovirus .
文摘Based on reported TMV-U1 sequence, primers were designed and fragments covering the entire genome of TMV broad bean strain (TMV-B) were obtained with RT-PCR. These fragments were cloned and sequenced and the 5’ and 3’ end sequences of genome were confirmed with RACE. The complete sequence of TMV-B comprises 6 395 nucleotides (nt) and four open reading frames, which correspond to 126 ku (1 116 amino acids), 183 ku (1 616 amino acids), 30 ku (268 amino acids) and 17.5 ku proteins (159 amino acids). The complete nucleotide sequence of TMV-B is 99.4% identical to that of TMV-U1. The two virus isolates share the same sequence of 5’, 3’ non-coding region and 17.5 K ORF, and 6, 1 and 3 amino acid changes are found in 126 K protein, 54 K protein and 30 K protein, respectively. The possible mechanism on the infection of TMV-B in Vicia faba is discussed.
基金This work was supported by the National Natural Science Foundation of China (Grant No. 39770414).
文摘Four kinds of mitochondrial plasmid-like DNAs, designated pC1, pC2, pC3 and pC4, were detected in Cucumis sativus Jinyan No. 4. The electron microscopy observation showed that pC4 was linear conformation. Complete sequence of pC4 was cloned into pUC19 with E. coli JM109 as host. Sequence analysis revealed that pC4 was 370 bp long, the shortest one among all the reported mitochondrial plasmid-like DNAs. pC4 was AT rich. It contained terminal direct repeat sequence (35 bp in length)as well as many short direct and inverted repeats. ORFs in pC4 were short. pC4 was found to be homologous to nuclear DNAs, but lack homology with main mitochondrial and chloroplast DNAs. pC4-homologous sequence also occurred in nuclear genome of Jinyan No. 7 which contained no mitochondrial plasmid-like DNAs. The hybridization pattern of Jinyan No. 7 was slightly different from that of Jinyan No. 4. This suggested that pC4 occurred at the forepart of Cucumis sativus species divergence and integrated into the nuclear genome,