Objective:Autosomal recessive bestrophinopathy(ARB),a retinal degenerative disease,is characterized by central visual loss,yellowish multifocal diffuse subretinal deposits,and a dramatic decrease in the light peak on ...Objective:Autosomal recessive bestrophinopathy(ARB),a retinal degenerative disease,is characterized by central visual loss,yellowish multifocal diffuse subretinal deposits,and a dramatic decrease in the light peak on electrooculogram.The potential pathogenic mechanism involves mutations in the BEST1 gene,which encodes Ca2+-activated Cl−channels in the retinal pigment epithelium(RPE),resulting in degeneration of RPE and photoreceptor.In this study,the complete clinical characteristics of two Chinese ARB families were summarized.Methods:Pacific Biosciences(PacBio)single-molecule real-time(SMRT)sequencing was performed on the probands to screen for disease-causing gene mutations,and Sanger sequencing was applied to validate variants in the patients and their family members.Results:Two novel mutations,c.202T>C(chr11:61722628,p.Y68H)and c.867+97G>A,in the BEST1 gene were identified in the two Chinese ARB families.The novel missense mutation BEST1 c.202T>C(p.Y68H)resulted in the substitution of tyrosine with histidine in the N-terminal region of transmembrane domain 2 of bestrophin-1.Another novel variant,BEST1 c.867+97G>A(chr11:61725867),located in intron 7,might be considered a regulatory variant that changes allele-specific binding affinity based on motifs of important transcriptional regulators.Conclusion:Our findings represent the first use of third-generation sequencing(TGS)to identify novel BEST1 mutations in patients with ARB,indicating that TGS can be a more accurate and efficient tool for identifying mutations in specific genes.The novel variants identified further broaden the mutation spectrum of BEST1 in the Chinese population.展开更多
BACKGROUND Infectious diseases are still one of the greatest threats to human health,and the etiology of 20%of cases of clinical fever is unknown;therefore,rapid identification of pathogens is highly important.Traditi...BACKGROUND Infectious diseases are still one of the greatest threats to human health,and the etiology of 20%of cases of clinical fever is unknown;therefore,rapid identification of pathogens is highly important.Traditional culture methods are only able to detect a limited number of pathogens and are time-consuming;serologic detection has window periods,false-positive and false-negative problems;and nucleic acid molecular detection methods can detect several known pathogens only once.Three-generation nanopore sequencing technology provides new options for identifying pathogens.CASE SUMMARY Case 1:The patient was admitted to the hospital with abdominal pain for three days and cessation of defecation for five days,accompanied by cough and sputum.Nanopore sequencing of the drainage fluid revealed the presence of orallike bacteria,leading to a clinical diagnosis of bronchopleural fistula.Cefoperazone sodium sulbactam treatment was effective.Case 2:The patient was admitted to the hospital with fever and headache,and CT revealed lung inflammation.Antibiotic treatment for Streptococcus pneumoniae,identified through nanopore sequencing of cerebrospinal fluid,was effective.Case 3:The patient was admitted to our hospital with intermittent fever and an enlarged neck mass that had persisted for more than six months.Despite antibacterial treatment,her symptoms worsened.The nanopore sequencing results indicate that voriconazole treatment is effective for Aspergillus brookii.The patient was diagnosed with mixed cell type classical Hodgkin's lymphoma with infection.CONCLUSION Three-generation nanopore sequencing technology allows for rapid and accurate detection of pathogens in human infectious diseases.展开更多
Background Breed identification is useful in a variety of biological contexts.Breed identification usually involves two stages,i.e.,detection of breed-informative SNPs and breed assignment.For both stages,there are se...Background Breed identification is useful in a variety of biological contexts.Breed identification usually involves two stages,i.e.,detection of breed-informative SNPs and breed assignment.For both stages,there are several methods proposed.However,what is the optimal combination of these methods remain unclear.In this study,using the whole genome sequence data available for 13 cattle breeds from Run 8 of the 1,000 Bull Genomes Project,we compared the combinations of three methods(Delta,FST,and In)for breed-informative SNP detection and five machine learning methods(KNN,SVM,RF,NB,and ANN)for breed assignment with respect to different reference population sizes and difference numbers of most breed-informative SNPs.In addition,we evaluated the accuracy of breed identification using SNP chip data of different densities.Results We found that all combinations performed quite well with identification accuracies over 95%in all scenarios.However,there was no combination which performed the best and robust across all scenarios.We proposed to inte-grate the three breed-informative detection methods,named DFI,and integrate the three machine learning methods,KNN,SVM,and RF,named KSR.We found that the combination of these two integrated methods outperformed the other combinations with accuracies over 99%in most cases and was very robust in all scenarios.The accuracies from using SNP chip data were only slightly lower than that from using sequence data in most cases.Conclusions The current study showed that the combination of DFI and KSR was the optimal strategy.Using sequence data resulted in higher accuracies than using chip data in most cases.However,the differences were gener-ally small.In view of the cost of genotyping,using chip data is also a good option for breed identification.展开更多
Objective:To surveill emerging variants by nanopore technology-based genome sequencing in different COVID-19 waves in Sri Lanka and to examine the association with the sample characteristics,and vaccination status.Met...Objective:To surveill emerging variants by nanopore technology-based genome sequencing in different COVID-19 waves in Sri Lanka and to examine the association with the sample characteristics,and vaccination status.Methods:The study analyzed 207 RNA positive swab samples received to sequence laboratory during different waves.The N gene cut-off threshold of less than 30 was considered as the major inclusion criteria.Viral RNA was extracted,and elutes were subjected to nanopore sequencing.All the sequencing data were uploaded in the publicly accessible database,GISAID.Results:The Omicron,Delta and Alpha variants accounted for 58%,22%and 4%of the variants throughout the period.Less than 1%were Kappa variant and 16%of the study samples remained unassigned.Omicron variant was circulated among all age groups and in all the provinces.Ct value and variants assigned percentage was 100%in Ct values of 10-15 while only 45%assigned Ct value over 25.Conclusions:The present study examined the emergence,prevalence,and distribution of SARS-CoV-2 variants locally and has shown that nanopore technology-based genome sequencing enables whole genome sequencing in a low resource setting country.展开更多
Tumor tissues contain both tumor and non-tumor cells,which include infiltrated immune cells and stromal cells,collectively called the tumor microenvironment(TME).Single-cell RNA sequencing(sc RNAseq)enables the examin...Tumor tissues contain both tumor and non-tumor cells,which include infiltrated immune cells and stromal cells,collectively called the tumor microenvironment(TME).Single-cell RNA sequencing(sc RNAseq)enables the examination of heterogeneity of tumor cells and TME.In this review,we examined sc RNAseq datasets for multiple cancer types and evaluated the heterogeneity of major cell type composition in different cancer types.We further showed that endothelial cells and fibroblasts/myofibroblasts in different cancer types can be classified into common subtypes,and the subtype composition is clearly associated with cancer characteristic and therapy response.展开更多
DNA barcodes,short and unique DNA sequences,play a crucial role in sample identification when processing many samples simultaneously,which helps reduce experimental costs.Nevertheless,the low quality of long-read sequ...DNA barcodes,short and unique DNA sequences,play a crucial role in sample identification when processing many samples simultaneously,which helps reduce experimental costs.Nevertheless,the low quality of long-read sequencing makes it difficult to identify barcodes accurately,which poses significant challenges for the design of barcodes for large numbers of samples in a single sequencing run.Here,we present a comprehensive study of the generation of barcodes and develop a tool,PRO,that can be used for selecting optimal barcode sets and demultiplexing.We formulate the barcode design problem as a combinatorial problem and prove that finding the optimal largest barcode set in a given DNA sequence space in which all sequences have the same length is theoretically NP-complete.For practical applications,we developed the novel method PRO by introducing the probability divergence between two DNA sequences to expand the capacity of barcode kits while ensuring demultiplexing accuracy.Specifically,the maximum size of the barcode kits designed by PRO is 2,292,which keeps the length of barcodes the same as that of the official ones used by Oxford Nanopore Technologies(ONT).We validated the performance of PRO on a simulated nanopore dataset with high error rates.The demultiplexing accuracy of PRO reached 98.29%for a barcode kit of size 2,922,4.31%higher than that of Guppy,the official demultiplexing tool.When the size of the barcode kit generated by PRO is the same as the official size provided by ONT,both tools show superior and comparable demultiplexing accuracy.展开更多
Gene sequencing is a great way to interpret life, and high-throughput sequencing technology is a revolutionary technological innovation in gene sequencing researches. This technology is characterized by low cost and h...Gene sequencing is a great way to interpret life, and high-throughput sequencing technology is a revolutionary technological innovation in gene sequencing researches. This technology is characterized by low cost and high-throughput data. Currently, high-throughput sequencing technology has been widely applied in multi-level researches on genomics, transcriptomics and epigenomics. And it has fundamentally changed the way we approach problems in basic and translational researches and created many new possibilities. This paper presented a general description of high-throughput sequencing technology and a comprehensive review of its application with plain, concisely and precisely. In order to help researchers finish their work faster and better, promote science amateurs and understand it easier and better.展开更多
Single-step genomic best linear unbiased prediction(ss GBLUP) is now intensively investigated and widely used in livestock breeding due to its beneficial feature of combining information from both genotyped and ungeno...Single-step genomic best linear unbiased prediction(ss GBLUP) is now intensively investigated and widely used in livestock breeding due to its beneficial feature of combining information from both genotyped and ungenotyped individuals in the single model. With the increasing accessibility of whole-genome sequence(WGS) data at the population level, more attention is being paid to the usage of WGS data in ss GBLUP. The predictive ability of ss GBLUP using WGS data might be improved by incorporating biological knowledge from public databases. Thus, we extended ss GBLUP, incorporated genomic annotation information into the model, and evaluated them using a yellow-feathered chicken population as the examples. The chicken population consisted of 1 338 birds with 23 traits, where imputed WGS data including 5 127 612 single nucleotide polymorphisms(SNPs) are available for 895 birds. Considering different combinations of annotation information and models, original ss GBLUP, haplotype-based ss GHBLUP, and four extended ss GBLUP incorporating genomic annotation models were evaluated. Based on the genomic annotation(GRCg6a) of chickens, 3 155 524 and 94 837 SNPs were mapped to genic and exonic regions, respectively. Extended ss GBLUP using genic/exonic SNPs outperformed other models with respect to predictive ability in 15 out of 23 traits, and their advantages ranged from 2.5 to 6.1% compared with original ss GBLUP. In addition, to further enhance the performance of genomic prediction with imputed WGS data, we investigated the genotyping strategies of reference population on ss GBLUP in the chicken population. Comparing two strategies of individual selection for genotyping in the reference population, the strategy of evenly selection by family(SBF) performed slightly better than random selection in most situations. Overall, we extended genomic prediction models that can comprehensively utilize WGS data and genomic annotation information in the framework of ss GBLUP, and validated the idea that properly handling the genomic annotation information and WGS data increased the predictive ability of ss GBLUP. Moreover, while using WGS data, the genotyping strategy of maximizing the expected genetic relationship between the reference and candidate population could further improve the predictive ability of ss GBLUP. The results from this study shed light on the comprehensive usage of genomic annotation information in WGS-based single-step genomic prediction.展开更多
[Objective] The research aimed to study the influence of automatic station data on the sequence continuity of historical meteorological data. [Method] Based on the temperature data which were measured by the automatic...[Objective] The research aimed to study the influence of automatic station data on the sequence continuity of historical meteorological data. [Method] Based on the temperature data which were measured by the automatic meteorological station and the corresponding artificial observation data during January-December in 2001, the monthly average, maximum and minimum temperatures in the automatic station were compared with the corresponding artificial observation temperature data in the parallel observation period by using the contrast difference and the standard deviation of difference value. The difference between the automatic station and the artificial data, the variation characteristics were understood. Meanwhile, the significance test and analysis of annual average value were carried out by the data sequence during 1990-2009. The influence of automatic station replacing the artificial observation on the sequence continuity of historical temperature data was discussed. [Result] Although the two temperature data in the parallel observation period had the certain difference, the difference was in the permitted range of automatic station difference value on average. The difference of individual month surpassed the permitted range of automatic station difference value. The significance test showed that the annual average temperature and the annual average minimum temperature which were observed in the automatic station had the difference with the historical data. It had the certain influence on the annual temperature sequence, but the difference wasn’t significant as a whole. When the automatic observation combined with the artificial observation to use, the sequence needed carry out the homogeneous test and correction. [Conclusion] The research played the important role on guaranteeing the monorail running of automatic station, optimizing the meteorological surface observation system, improving the climate sequence continuity of meteorological element and the reliability of climate statistics.展开更多
This Paper presents a data fusion method with distributed sequence detection for on hypothasis testingtheory including the data fusion algorithm of sequence detection based on least error probability rule, the decisio...This Paper presents a data fusion method with distributed sequence detection for on hypothasis testingtheory including the data fusion algorithm of sequence detection based on least error probability rule, the decision ruleand the calcation formula of the detction times and the simulation result of system performance as well.展开更多
The recognition and contrast of bed sets in parasequence is difficult in terrestrial basin high-resolution sequence stratigraphy. This study puts forward new methods for the boundary identification and contrast of bed...The recognition and contrast of bed sets in parasequence is difficult in terrestrial basin high-resolution sequence stratigraphy. This study puts forward new methods for the boundary identification and contrast of bed sets on the basis of manifold logging data. The formation of calcareous interbeds, shale resistivity differences and the relation of reservoir resistivity to altitude are considered on the basis of log curve morphological characteristics, core observation, cast thin section, X-ray diffraction and scanning electron microscopy. The results show that the thickness of calcareous interbeds is between 0.5 m and 2 m, increasing on weathering crusts and faults. Calcareous interbeds occur at the bottom of a distributary channel and the top of a distributary mouth bar. Lower resistivity shale (4-5 Ω · m) and higher resistivity shale (〉 10Ω·m) reflect differences in sediment fountain or sediment microfacies. Reservoir resistivity increases with altitude. Calcareous interbeds may be a symbol of recognition for the boundary of bed sets and isochronous contrast bed sets, and shale resistivity differences may confirm the stack relation and connectivity of bed sets. Based on this, a high-resolution chronostratigraphic frame- work of Xi-1 segment in Shinan area, Junggar basin is presented, and the connectivity of bed sets and oil-water contact is confirmed. In this chronostratigraphic framework, the growth order, stack mode and space shape of bed sets are qualitatively and quantitatively described.展开更多
Massively parallel sequencing(MPS), alias next-generation sequencing, is making its way from research laboratories into applied sciences and clinics. MPS is a framework of experimental procedures which offer possibili...Massively parallel sequencing(MPS), alias next-generation sequencing, is making its way from research laboratories into applied sciences and clinics. MPS is a framework of experimental procedures which offer possibilities for genome research and genetics which could only be dreamed of until around 2005 when these technologies became available. Sequencing of a transcriptome, exome, even entire genomes is now possible within a time frame and precision that we could only hope for 10 years ago. Linking other experimental procedures with MPS enables researchers to study secondary DNA modifications across the entire genome, and protein binding sites, to name a few applications. How the advancements of sequencing technologies can contribute to transplantation science is subject of this discussion: immediate applications are in graft matching via human leukocyte antigen sequencing, as part of systems biology approaches which shed light on gene expression processes during immune response, as biomarkers of graft rejection, and to explore changes of microbiomes as a result of transplantation. Of considerable importance is the socio-ethical aspect of data ownership, privacy, informed consent, and result report to the study participant. While the technology is advancing rapidly, legislation is lagging behind due to the globalisation of data requisition, banking and sharing.展开更多
miRNAs are non-coding RNAs that play a regulatory role in expression of genes and are associated with diseases. Quantitatively measuring expression levels of miRNAs can help understanding the mechanisms of human disea...miRNAs are non-coding RNAs that play a regulatory role in expression of genes and are associated with diseases. Quantitatively measuring expression levels of miRNAs can help understanding the mechanisms of human diseases and discovering new drug targets. There are three major methods that have been used to measure the expression levels of miRNAs: real-time reverse transcription PCR (qRT-PCR), microarray, and the newly introduced next-generation sequencing (NGS). NGS is not only suitable for profiling of known miRNAs that qRT-PCR and microarray can do too but also able to detect unknown miRNAs that the other two methods are incapable. Profiling of miRNAs by NGS has been progressed rapidly and is a promising field for applications in drug development. This paper will review the technical advancement of NGS for profiling miRNAs, including comparative analyses between different platforms and software packages for analyzing NGS data. Examples and future perspectives of applications of NGS profiling miRNAs in drug development will be discussed.展开更多
An unequal time interval sequence or a sequence with blanks is usually completed with average generation in grey system theory. This paper discovers that there exists obvious errors when using average generation to ge...An unequal time interval sequence or a sequence with blanks is usually completed with average generation in grey system theory. This paper discovers that there exists obvious errors when using average generation to generate internal points of non-consecutive neighbours. The average generation and the preference generation of the sequence are discussed, the concave and convex properties show the status of local sequence and propose a new idea for using the status to build up the criteria of choosing generation coefficient. Compared with the general average method of the one-dimensional data sequence, the two-dimensional data sequence is defined and its average generation is discussed, and the coefficient decision method for the preference generation is presented.展开更多
Online sensing can provide useful information in monitoring applications,for example,machine health monitoring,structural condition monitoring,environmental monitoring,and many more.Missing data is generally a signifi...Online sensing can provide useful information in monitoring applications,for example,machine health monitoring,structural condition monitoring,environmental monitoring,and many more.Missing data is generally a significant issue in the sensory data that is collected online by sensing systems,which may affect the goals of monitoring programs.In this paper,a sequence-to-sequence learning model based on a recurrent neural network(RNN)architecture is presented.In the proposed method,multivariate time series of the monitored parameters is embedded into the neural network through layer-by-layer encoders where the hidden features of the inputs are adaptively extracted.Afterwards,predictions of the missing data are generated by network decoders,which are one-step-ahead predictive data sequences of the monitored parameters.The prediction performance of the proposed model is validated based on a real-world sensory dataset.The experimental results demonstrate the performance of the proposed RNN-encoder-decoder model with its capability in sequence-to-sequence learning for online imputation of sensory data.展开更多
文摘Objective:Autosomal recessive bestrophinopathy(ARB),a retinal degenerative disease,is characterized by central visual loss,yellowish multifocal diffuse subretinal deposits,and a dramatic decrease in the light peak on electrooculogram.The potential pathogenic mechanism involves mutations in the BEST1 gene,which encodes Ca2+-activated Cl−channels in the retinal pigment epithelium(RPE),resulting in degeneration of RPE and photoreceptor.In this study,the complete clinical characteristics of two Chinese ARB families were summarized.Methods:Pacific Biosciences(PacBio)single-molecule real-time(SMRT)sequencing was performed on the probands to screen for disease-causing gene mutations,and Sanger sequencing was applied to validate variants in the patients and their family members.Results:Two novel mutations,c.202T>C(chr11:61722628,p.Y68H)and c.867+97G>A,in the BEST1 gene were identified in the two Chinese ARB families.The novel missense mutation BEST1 c.202T>C(p.Y68H)resulted in the substitution of tyrosine with histidine in the N-terminal region of transmembrane domain 2 of bestrophin-1.Another novel variant,BEST1 c.867+97G>A(chr11:61725867),located in intron 7,might be considered a regulatory variant that changes allele-specific binding affinity based on motifs of important transcriptional regulators.Conclusion:Our findings represent the first use of third-generation sequencing(TGS)to identify novel BEST1 mutations in patients with ARB,indicating that TGS can be a more accurate and efficient tool for identifying mutations in specific genes.The novel variants identified further broaden the mutation spectrum of BEST1 in the Chinese population.
基金Supported by Research and Development Funding for Medical and Health Institutions,No.2021YL007.
文摘BACKGROUND Infectious diseases are still one of the greatest threats to human health,and the etiology of 20%of cases of clinical fever is unknown;therefore,rapid identification of pathogens is highly important.Traditional culture methods are only able to detect a limited number of pathogens and are time-consuming;serologic detection has window periods,false-positive and false-negative problems;and nucleic acid molecular detection methods can detect several known pathogens only once.Three-generation nanopore sequencing technology provides new options for identifying pathogens.CASE SUMMARY Case 1:The patient was admitted to the hospital with abdominal pain for three days and cessation of defecation for five days,accompanied by cough and sputum.Nanopore sequencing of the drainage fluid revealed the presence of orallike bacteria,leading to a clinical diagnosis of bronchopleural fistula.Cefoperazone sodium sulbactam treatment was effective.Case 2:The patient was admitted to the hospital with fever and headache,and CT revealed lung inflammation.Antibiotic treatment for Streptococcus pneumoniae,identified through nanopore sequencing of cerebrospinal fluid,was effective.Case 3:The patient was admitted to our hospital with intermittent fever and an enlarged neck mass that had persisted for more than six months.Despite antibacterial treatment,her symptoms worsened.The nanopore sequencing results indicate that voriconazole treatment is effective for Aspergillus brookii.The patient was diagnosed with mixed cell type classical Hodgkin's lymphoma with infection.CONCLUSION Three-generation nanopore sequencing technology allows for rapid and accurate detection of pathogens in human infectious diseases.
基金funded by National Key Research and Development Program of China(2021YFD1200404)the Yangzhou University Interdisciplinary Research Foundation for Animal Science Discipline of Targeted Support(yzuxk202016)the Project of Genetic Improvement for Agricultural Species(Dairy Cattle)of Shandong Province(2019LZGC011).
文摘Background Breed identification is useful in a variety of biological contexts.Breed identification usually involves two stages,i.e.,detection of breed-informative SNPs and breed assignment.For both stages,there are several methods proposed.However,what is the optimal combination of these methods remain unclear.In this study,using the whole genome sequence data available for 13 cattle breeds from Run 8 of the 1,000 Bull Genomes Project,we compared the combinations of three methods(Delta,FST,and In)for breed-informative SNP detection and five machine learning methods(KNN,SVM,RF,NB,and ANN)for breed assignment with respect to different reference population sizes and difference numbers of most breed-informative SNPs.In addition,we evaluated the accuracy of breed identification using SNP chip data of different densities.Results We found that all combinations performed quite well with identification accuracies over 95%in all scenarios.However,there was no combination which performed the best and robust across all scenarios.We proposed to inte-grate the three breed-informative detection methods,named DFI,and integrate the three machine learning methods,KNN,SVM,and RF,named KSR.We found that the combination of these two integrated methods outperformed the other combinations with accuracies over 99%in most cases and was very robust in all scenarios.The accuracies from using SNP chip data were only slightly lower than that from using sequence data in most cases.Conclusions The current study showed that the combination of DFI and KSR was the optimal strategy.Using sequence data resulted in higher accuracies than using chip data in most cases.However,the differences were gener-ally small.In view of the cost of genotyping,using chip data is also a good option for breed identification.
文摘Objective:To surveill emerging variants by nanopore technology-based genome sequencing in different COVID-19 waves in Sri Lanka and to examine the association with the sample characteristics,and vaccination status.Methods:The study analyzed 207 RNA positive swab samples received to sequence laboratory during different waves.The N gene cut-off threshold of less than 30 was considered as the major inclusion criteria.Viral RNA was extracted,and elutes were subjected to nanopore sequencing.All the sequencing data were uploaded in the publicly accessible database,GISAID.Results:The Omicron,Delta and Alpha variants accounted for 58%,22%and 4%of the variants throughout the period.Less than 1%were Kappa variant and 16%of the study samples remained unassigned.Omicron variant was circulated among all age groups and in all the provinces.Ct value and variants assigned percentage was 100%in Ct values of 10-15 while only 45%assigned Ct value over 25.Conclusions:The present study examined the emergence,prevalence,and distribution of SARS-CoV-2 variants locally and has shown that nanopore technology-based genome sequencing enables whole genome sequencing in a low resource setting country.
基金partially supported by NIH grants(Grant Nos.R01CA249175 and U19AI118610)。
文摘Tumor tissues contain both tumor and non-tumor cells,which include infiltrated immune cells and stromal cells,collectively called the tumor microenvironment(TME).Single-cell RNA sequencing(sc RNAseq)enables the examination of heterogeneity of tumor cells and TME.In this review,we examined sc RNAseq datasets for multiple cancer types and evaluated the heterogeneity of major cell type composition in different cancer types.We further showed that endothelial cells and fibroblasts/myofibroblasts in different cancer types can be classified into common subtypes,and the subtype composition is clearly associated with cancer characteristic and therapy response.
文摘DNA barcodes,short and unique DNA sequences,play a crucial role in sample identification when processing many samples simultaneously,which helps reduce experimental costs.Nevertheless,the low quality of long-read sequencing makes it difficult to identify barcodes accurately,which poses significant challenges for the design of barcodes for large numbers of samples in a single sequencing run.Here,we present a comprehensive study of the generation of barcodes and develop a tool,PRO,that can be used for selecting optimal barcode sets and demultiplexing.We formulate the barcode design problem as a combinatorial problem and prove that finding the optimal largest barcode set in a given DNA sequence space in which all sequences have the same length is theoretically NP-complete.For practical applications,we developed the novel method PRO by introducing the probability divergence between two DNA sequences to expand the capacity of barcode kits while ensuring demultiplexing accuracy.Specifically,the maximum size of the barcode kits designed by PRO is 2,292,which keeps the length of barcodes the same as that of the official ones used by Oxford Nanopore Technologies(ONT).We validated the performance of PRO on a simulated nanopore dataset with high error rates.The demultiplexing accuracy of PRO reached 98.29%for a barcode kit of size 2,922,4.31%higher than that of Guppy,the official demultiplexing tool.When the size of the barcode kit generated by PRO is the same as the official size provided by ONT,both tools show superior and comparable demultiplexing accuracy.
基金Supported by the National Natural Science Foundations of China(3127218631301791)
文摘Gene sequencing is a great way to interpret life, and high-throughput sequencing technology is a revolutionary technological innovation in gene sequencing researches. This technology is characterized by low cost and high-throughput data. Currently, high-throughput sequencing technology has been widely applied in multi-level researches on genomics, transcriptomics and epigenomics. And it has fundamentally changed the way we approach problems in basic and translational researches and created many new possibilities. This paper presented a general description of high-throughput sequencing technology and a comprehensive review of its application with plain, concisely and precisely. In order to help researchers finish their work faster and better, promote science amateurs and understand it easier and better.
基金supported by the National Natural Science Foundation of China(32022078)the Local Innovative and Research Teams Project of Guangdong Province,China(2019BT02N630)the support from the National Supercomputer Center in Guangzhou,China。
文摘Single-step genomic best linear unbiased prediction(ss GBLUP) is now intensively investigated and widely used in livestock breeding due to its beneficial feature of combining information from both genotyped and ungenotyped individuals in the single model. With the increasing accessibility of whole-genome sequence(WGS) data at the population level, more attention is being paid to the usage of WGS data in ss GBLUP. The predictive ability of ss GBLUP using WGS data might be improved by incorporating biological knowledge from public databases. Thus, we extended ss GBLUP, incorporated genomic annotation information into the model, and evaluated them using a yellow-feathered chicken population as the examples. The chicken population consisted of 1 338 birds with 23 traits, where imputed WGS data including 5 127 612 single nucleotide polymorphisms(SNPs) are available for 895 birds. Considering different combinations of annotation information and models, original ss GBLUP, haplotype-based ss GHBLUP, and four extended ss GBLUP incorporating genomic annotation models were evaluated. Based on the genomic annotation(GRCg6a) of chickens, 3 155 524 and 94 837 SNPs were mapped to genic and exonic regions, respectively. Extended ss GBLUP using genic/exonic SNPs outperformed other models with respect to predictive ability in 15 out of 23 traits, and their advantages ranged from 2.5 to 6.1% compared with original ss GBLUP. In addition, to further enhance the performance of genomic prediction with imputed WGS data, we investigated the genotyping strategies of reference population on ss GBLUP in the chicken population. Comparing two strategies of individual selection for genotyping in the reference population, the strategy of evenly selection by family(SBF) performed slightly better than random selection in most situations. Overall, we extended genomic prediction models that can comprehensively utilize WGS data and genomic annotation information in the framework of ss GBLUP, and validated the idea that properly handling the genomic annotation information and WGS data increased the predictive ability of ss GBLUP. Moreover, while using WGS data, the genotyping strategy of maximizing the expected genetic relationship between the reference and candidate population could further improve the predictive ability of ss GBLUP. The results from this study shed light on the comprehensive usage of genomic annotation information in WGS-based single-step genomic prediction.
文摘[Objective] The research aimed to study the influence of automatic station data on the sequence continuity of historical meteorological data. [Method] Based on the temperature data which were measured by the automatic meteorological station and the corresponding artificial observation data during January-December in 2001, the monthly average, maximum and minimum temperatures in the automatic station were compared with the corresponding artificial observation temperature data in the parallel observation period by using the contrast difference and the standard deviation of difference value. The difference between the automatic station and the artificial data, the variation characteristics were understood. Meanwhile, the significance test and analysis of annual average value were carried out by the data sequence during 1990-2009. The influence of automatic station replacing the artificial observation on the sequence continuity of historical temperature data was discussed. [Result] Although the two temperature data in the parallel observation period had the certain difference, the difference was in the permitted range of automatic station difference value on average. The difference of individual month surpassed the permitted range of automatic station difference value. The significance test showed that the annual average temperature and the annual average minimum temperature which were observed in the automatic station had the difference with the historical data. It had the certain influence on the annual temperature sequence, but the difference wasn’t significant as a whole. When the automatic observation combined with the artificial observation to use, the sequence needed carry out the homogeneous test and correction. [Conclusion] The research played the important role on guaranteeing the monorail running of automatic station, optimizing the meteorological surface observation system, improving the climate sequence continuity of meteorological element and the reliability of climate statistics.
文摘This Paper presents a data fusion method with distributed sequence detection for on hypothasis testingtheory including the data fusion algorithm of sequence detection based on least error probability rule, the decision ruleand the calcation formula of the detction times and the simulation result of system performance as well.
基金This paper is supported by the Main Project of the National Tenth Five-Year Plan .
文摘The recognition and contrast of bed sets in parasequence is difficult in terrestrial basin high-resolution sequence stratigraphy. This study puts forward new methods for the boundary identification and contrast of bed sets on the basis of manifold logging data. The formation of calcareous interbeds, shale resistivity differences and the relation of reservoir resistivity to altitude are considered on the basis of log curve morphological characteristics, core observation, cast thin section, X-ray diffraction and scanning electron microscopy. The results show that the thickness of calcareous interbeds is between 0.5 m and 2 m, increasing on weathering crusts and faults. Calcareous interbeds occur at the bottom of a distributary channel and the top of a distributary mouth bar. Lower resistivity shale (4-5 Ω · m) and higher resistivity shale (〉 10Ω·m) reflect differences in sediment fountain or sediment microfacies. Reservoir resistivity increases with altitude. Calcareous interbeds may be a symbol of recognition for the boundary of bed sets and isochronous contrast bed sets, and shale resistivity differences may confirm the stack relation and connectivity of bed sets. Based on this, a high-resolution chronostratigraphic frame- work of Xi-1 segment in Shinan area, Junggar basin is presented, and the connectivity of bed sets and oil-water contact is confirmed. In this chronostratigraphic framework, the growth order, stack mode and space shape of bed sets are qualitatively and quantitatively described.
文摘Massively parallel sequencing(MPS), alias next-generation sequencing, is making its way from research laboratories into applied sciences and clinics. MPS is a framework of experimental procedures which offer possibilities for genome research and genetics which could only be dreamed of until around 2005 when these technologies became available. Sequencing of a transcriptome, exome, even entire genomes is now possible within a time frame and precision that we could only hope for 10 years ago. Linking other experimental procedures with MPS enables researchers to study secondary DNA modifications across the entire genome, and protein binding sites, to name a few applications. How the advancements of sequencing technologies can contribute to transplantation science is subject of this discussion: immediate applications are in graft matching via human leukocyte antigen sequencing, as part of systems biology approaches which shed light on gene expression processes during immune response, as biomarkers of graft rejection, and to explore changes of microbiomes as a result of transplantation. Of considerable importance is the socio-ethical aspect of data ownership, privacy, informed consent, and result report to the study participant. While the technology is advancing rapidly, legislation is lagging behind due to the globalisation of data requisition, banking and sharing.
文摘miRNAs are non-coding RNAs that play a regulatory role in expression of genes and are associated with diseases. Quantitatively measuring expression levels of miRNAs can help understanding the mechanisms of human diseases and discovering new drug targets. There are three major methods that have been used to measure the expression levels of miRNAs: real-time reverse transcription PCR (qRT-PCR), microarray, and the newly introduced next-generation sequencing (NGS). NGS is not only suitable for profiling of known miRNAs that qRT-PCR and microarray can do too but also able to detect unknown miRNAs that the other two methods are incapable. Profiling of miRNAs by NGS has been progressed rapidly and is a promising field for applications in drug development. This paper will review the technical advancement of NGS for profiling miRNAs, including comparative analyses between different platforms and software packages for analyzing NGS data. Examples and future perspectives of applications of NGS profiling miRNAs in drug development will be discussed.
文摘An unequal time interval sequence or a sequence with blanks is usually completed with average generation in grey system theory. This paper discovers that there exists obvious errors when using average generation to generate internal points of non-consecutive neighbours. The average generation and the preference generation of the sequence are discussed, the concave and convex properties show the status of local sequence and propose a new idea for using the status to build up the criteria of choosing generation coefficient. Compared with the general average method of the one-dimensional data sequence, the two-dimensional data sequence is defined and its average generation is discussed, and the coefficient decision method for the preference generation is presented.
文摘Online sensing can provide useful information in monitoring applications,for example,machine health monitoring,structural condition monitoring,environmental monitoring,and many more.Missing data is generally a significant issue in the sensory data that is collected online by sensing systems,which may affect the goals of monitoring programs.In this paper,a sequence-to-sequence learning model based on a recurrent neural network(RNN)architecture is presented.In the proposed method,multivariate time series of the monitored parameters is embedded into the neural network through layer-by-layer encoders where the hidden features of the inputs are adaptively extracted.Afterwards,predictions of the missing data are generated by network decoders,which are one-step-ahead predictive data sequences of the monitored parameters.The prediction performance of the proposed model is validated based on a real-world sensory dataset.The experimental results demonstrate the performance of the proposed RNN-encoder-decoder model with its capability in sequence-to-sequence learning for online imputation of sensory data.