With the rapid development of the genomic sequencing technology,the cost of obtaining personal genomic data and effectively analyzing it has been gradually reduced.The analysis and utilization of genomic dam gradually...With the rapid development of the genomic sequencing technology,the cost of obtaining personal genomic data and effectively analyzing it has been gradually reduced.The analysis and utilization of genomic dam gradually entered the public view,and the leakage of genomic dam privacy has attracted the attention of researchers.The security of genomic data is not only related to the protection of personal privacy,but also related to the biological information security of the country.However,there is still no.effective genomic dam privacy protection scheme using Shangyong Mima(SM)algorithms.In this paper,we analyze the widely used genomic dam file formats and design a large genomic dam files encryption scheme based on the SM algorithms.Firstly,we design a key agreement protocol based on the SM2 asymmetric cryptography and use the SM3 hash function to guarantee the correctness of the key.Secondly,we used the SM4 symmetric cryptography to encrypt the genomic data by optimizing the packet processing of files,and improve the usability by assisting the computing platform with key management.Software implementation demonstrates that the scheme can be applied to securely transmit the genomic data in the network environment and provide an encryption method based on SM algorithms for protecting the privacy of genomic data.展开更多
Recent advances in genomic and post-genomic technologies have provided the opportu- nity to generate a previously unimaginable amount of information. However, biological knowledge is still needed to improve the unders...Recent advances in genomic and post-genomic technologies have provided the opportu- nity to generate a previously unimaginable amount of information. However, biological knowledge is still needed to improve the understanding of complex mechanisms such as plant immune responses. Better knowledge of this process could improve crop production and management. Here, we used holistic analysis to combine our own microarray and RNA-seq data with public genomic data from Arabidopsis and cassava in order to acquire biological knowledge about the relationships between proteins encoded by immunity-related genes (IRGs) and other genes. This approach was based on a kernel method adapted for the construction of gene networks. The obtained results allowed us to propose a list of new IRGs. A putative function in the immunity pathway was predicted for the new IRGs. The analysis of networks revealed that our predicted IRGs are either well documented or recognized in previous co-expression studies. In addition to robust relationships between IRGs, there is evidence suggesting that other cellular processes may be also strongly related to immunity.展开更多
Background Breed identification is useful in a variety of biological contexts.Breed identification usually involves two stages,i.e.,detection of breed-informative SNPs and breed assignment.For both stages,there are se...Background Breed identification is useful in a variety of biological contexts.Breed identification usually involves two stages,i.e.,detection of breed-informative SNPs and breed assignment.For both stages,there are several methods proposed.However,what is the optimal combination of these methods remain unclear.In this study,using the whole genome sequence data available for 13 cattle breeds from Run 8 of the 1,000 Bull Genomes Project,we compared the combinations of three methods(Delta,FST,and In)for breed-informative SNP detection and five machine learning methods(KNN,SVM,RF,NB,and ANN)for breed assignment with respect to different reference population sizes and difference numbers of most breed-informative SNPs.In addition,we evaluated the accuracy of breed identification using SNP chip data of different densities.Results We found that all combinations performed quite well with identification accuracies over 95%in all scenarios.However,there was no combination which performed the best and robust across all scenarios.We proposed to inte-grate the three breed-informative detection methods,named DFI,and integrate the three machine learning methods,KNN,SVM,and RF,named KSR.We found that the combination of these two integrated methods outperformed the other combinations with accuracies over 99%in most cases and was very robust in all scenarios.The accuracies from using SNP chip data were only slightly lower than that from using sequence data in most cases.Conclusions The current study showed that the combination of DFI and KSR was the optimal strategy.Using sequence data resulted in higher accuracies than using chip data in most cases.However,the differences were gener-ally small.In view of the cost of genotyping,using chip data is also a good option for breed identification.展开更多
Rice is one of cereal crops and a model species for monocots.Since the release of the first draft rice genome sequences in 2002,considerable progress has been achieved in rice genomic researches,thanks to rapid develo...Rice is one of cereal crops and a model species for monocots.Since the release of the first draft rice genome sequences in 2002,considerable progress has been achieved in rice genomic researches,thanks to rapid development and efficient utilization of bioinformatics methods and tools.In this review,we summarize the progress of studies of rice genome sequencing and other omics and introduce the wellmaintained bioinformatics databases and tools developed for rice genome resources and breeding.After reviewing the history of rice bioinformatics,we use single-cell sequencing and machine learning as examples showing how bioinformatics integrates emerging technologies and how it continues to develop for future rice research.展开更多
Objective:To surveill emerging variants by nanopore technology-based genome sequencing in different COVID-19 waves in Sri Lanka and to examine the association with the sample characteristics,and vaccination status.Met...Objective:To surveill emerging variants by nanopore technology-based genome sequencing in different COVID-19 waves in Sri Lanka and to examine the association with the sample characteristics,and vaccination status.Methods:The study analyzed 207 RNA positive swab samples received to sequence laboratory during different waves.The N gene cut-off threshold of less than 30 was considered as the major inclusion criteria.Viral RNA was extracted,and elutes were subjected to nanopore sequencing.All the sequencing data were uploaded in the publicly accessible database,GISAID.Results:The Omicron,Delta and Alpha variants accounted for 58%,22%and 4%of the variants throughout the period.Less than 1%were Kappa variant and 16%of the study samples remained unassigned.Omicron variant was circulated among all age groups and in all the provinces.Ct value and variants assigned percentage was 100%in Ct values of 10-15 while only 45%assigned Ct value over 25.Conclusions:The present study examined the emergence,prevalence,and distribution of SARS-CoV-2 variants locally and has shown that nanopore technology-based genome sequencing enables whole genome sequencing in a low resource setting country.展开更多
Precision medicine is transforming psychiatric treatment by tailoring personalized healthcare interventions based on clinical,genetic,environmental,and lifestyle factors to optimize medication management.This study in...Precision medicine is transforming psychiatric treatment by tailoring personalized healthcare interventions based on clinical,genetic,environmental,and lifestyle factors to optimize medication management.This study investigates how artificial intelligence(AI)and machine learning(ML)can address key challenges in integrating pharmacogenomics(PGx)into psychiatric care.In this integration,AI analyzes vast genomic datasets to identify genetic markers linked to psychiatric conditions.AI-driven models integrating genomic,clinical,and demographic data demonstrated high accuracy in predicting treatment outcomes for major depressive disorder and bipolar disorder.This study also examines the pressing challenges and provides strategic directions for integrating AI and ML in genomic psychiatry,highlighting the importance of ethical considerations and the need for personalized treatment.Effective implementation of AI-driven clinical decision support systems within electronic health records is crucial for translating PGx into routine psychiatric care.Future research should focus on developing enhanced AI-driven predictive models,privacy-preserving data exchange,and robust informatics systems to optimize patient outcomes and advance precision medicine in psychiatry.展开更多
Background: Beef cattle breeding programs in Brazil have placed greater emphasis on the genomic study of reproductive traits of males and females due to their economic importance. In this study, genome-wide associati...Background: Beef cattle breeding programs in Brazil have placed greater emphasis on the genomic study of reproductive traits of males and females due to their economic importance. In this study, genome-wide associations were assessed for scrotal circumference at 210 d of age, scrotal circumference at 420 d of age, age at first calving, and age at second calving, in Canchim beef cattle. Data quality control was conducted resulting in 672,778 SNPs and 392 animals.Results: Associated SNPs were observed for scrotal circumference at 420 d of age(435 SNPs), followed by scrotal circumference at 210 d of age(12 SNPs), age at first calving(six SNPs), and age at second calving(four SNPs). We investigated whether significant SNPs were within genic or surrounding regions. Biological processes of genes were associated with immune system, multicellular organismal process, response to stimulus, apoptotic process, cellular component organization or biogenesis, biological adhesion, and reproduction.Conclusions: Few associations were observed for scrotal circumference at 210 d of age, age at first calving, and age at second calving, reinforcing their polygenic inheritance and the complexity of understanding the genetic architecture of reproductive traits. Finding many associations for scrotal circumference at 420 d of age in various regions of the Canchim genome also reveals the difficulty of targeting specific candidate genes that could act on fertility; nonetheless,the high linkage disequilibrium between loci herein estimated could aid to overcome this issue. Therefore, al relevant information about genomic regions influencing reproductive traits may contribute to target candidate genes for further investigation of causal mutations and aid in future genomic studies in Canchim cattle to improve the breeding program.展开更多
Publicly-accessible resources have promoted the advance of scientific discovery. The era of genomics and big data has brought the need for collaboration and data sharing in order to make effective use of this new know...Publicly-accessible resources have promoted the advance of scientific discovery. The era of genomics and big data has brought the need for collaboration and data sharing in order to make effective use of this new knowledge. Here, we describe the web resources for cancer genomics research and rate them on the basis of the diversity of cancer types, sample size, omics data comprehensiveness, and user experience. The resources reviewed include data repository and analysis tools; and we hope such introduction will promote the awareness and facilitate the usage of these resources in the cancer research community.展开更多
Genome data of severe acute respiratory syndrome coronavirus 2(SARS-CoV-2)is essential for virus diagnosis,vaccine development,and variant surveillance.To archive and integrate worldwide SARS-CoV-2 genome data,a serie...Genome data of severe acute respiratory syndrome coronavirus 2(SARS-CoV-2)is essential for virus diagnosis,vaccine development,and variant surveillance.To archive and integrate worldwide SARS-CoV-2 genome data,a series of resources have been constructed,serving as a fundamental infrastructure for SARS-CoV-2 research,pandemic prevention and control,and coronavirus disease 2019(COVID-19)therapy.Here we present an over-view of extant SARS-CoV-2 resources that are devoted to genome data deposition and integration.We review deposition resources in data accessibility,metadata standardization,data curation and annotation;review integrative resources in data source,de-redundancy processing,data curation and quality assessment,and variant annotation.Moreover,we address issues that impede SARS-CoV-2 genome data integration,including low-complexity,inconsistency and absence of isolate name,sequence inconsistency,asynchronous update of genome data,and mismatched metadata.We finally provide insights into data standardization consensus and data submission guidelines,to promote SARS-CoV-2 genome data sharing and integration.展开更多
Restriction endonuclease analysis(REA),or restriction fragment length polymorphism(RFLP),was useful for identifying and determining the relatedness and putative identities of microbial strains(Tang et al.,1997)and for...Restriction endonuclease analysis(REA),or restriction fragment length polymorphism(RFLP),was useful for identifying and determining the relatedness and putative identities of microbial strains(Tang et al.,1997)and for characterizing and discriminating large numbers of samples inexpensively in the past。展开更多
基金supported by the National Key Research and Development Program of China(No.2016YFC1000307)the National Natural Science Foundation of China(No.61571024,No.61971021).
文摘With the rapid development of the genomic sequencing technology,the cost of obtaining personal genomic data and effectively analyzing it has been gradually reduced.The analysis and utilization of genomic dam gradually entered the public view,and the leakage of genomic dam privacy has attracted the attention of researchers.The security of genomic data is not only related to the protection of personal privacy,but also related to the biological information security of the country.However,there is still no.effective genomic dam privacy protection scheme using Shangyong Mima(SM)algorithms.In this paper,we analyze the widely used genomic dam file formats and design a large genomic dam files encryption scheme based on the SM algorithms.Firstly,we design a key agreement protocol based on the SM2 asymmetric cryptography and use the SM3 hash function to guarantee the correctness of the key.Secondly,we used the SM4 symmetric cryptography to encrypt the genomic data by optimizing the packet processing of files,and improve the usability by assisting the computing platform with key management.Software implementation demonstrates that the scheme can be applied to securely transmit the genomic data in the network environment and provide an encryption method based on SM algorithms for protecting the privacy of genomic data.
基金financially supported by the Direccio'n de Investi-gacio'n Sede Bogota'of the Universidad Nacional de Colombia(Grant No.201010016738)
文摘Recent advances in genomic and post-genomic technologies have provided the opportu- nity to generate a previously unimaginable amount of information. However, biological knowledge is still needed to improve the understanding of complex mechanisms such as plant immune responses. Better knowledge of this process could improve crop production and management. Here, we used holistic analysis to combine our own microarray and RNA-seq data with public genomic data from Arabidopsis and cassava in order to acquire biological knowledge about the relationships between proteins encoded by immunity-related genes (IRGs) and other genes. This approach was based on a kernel method adapted for the construction of gene networks. The obtained results allowed us to propose a list of new IRGs. A putative function in the immunity pathway was predicted for the new IRGs. The analysis of networks revealed that our predicted IRGs are either well documented or recognized in previous co-expression studies. In addition to robust relationships between IRGs, there is evidence suggesting that other cellular processes may be also strongly related to immunity.
基金funded by National Key Research and Development Program of China(2021YFD1200404)the Yangzhou University Interdisciplinary Research Foundation for Animal Science Discipline of Targeted Support(yzuxk202016)the Project of Genetic Improvement for Agricultural Species(Dairy Cattle)of Shandong Province(2019LZGC011).
文摘Background Breed identification is useful in a variety of biological contexts.Breed identification usually involves two stages,i.e.,detection of breed-informative SNPs and breed assignment.For both stages,there are several methods proposed.However,what is the optimal combination of these methods remain unclear.In this study,using the whole genome sequence data available for 13 cattle breeds from Run 8 of the 1,000 Bull Genomes Project,we compared the combinations of three methods(Delta,FST,and In)for breed-informative SNP detection and five machine learning methods(KNN,SVM,RF,NB,and ANN)for breed assignment with respect to different reference population sizes and difference numbers of most breed-informative SNPs.In addition,we evaluated the accuracy of breed identification using SNP chip data of different densities.Results We found that all combinations performed quite well with identification accuracies over 95%in all scenarios.However,there was no combination which performed the best and robust across all scenarios.We proposed to inte-grate the three breed-informative detection methods,named DFI,and integrate the three machine learning methods,KNN,SVM,and RF,named KSR.We found that the combination of these two integrated methods outperformed the other combinations with accuracies over 99%in most cases and was very robust in all scenarios.The accuracies from using SNP chip data were only slightly lower than that from using sequence data in most cases.Conclusions The current study showed that the combination of DFI and KSR was the optimal strategy.Using sequence data resulted in higher accuracies than using chip data in most cases.However,the differences were gener-ally small.In view of the cost of genotyping,using chip data is also a good option for breed identification.
基金supported by the National Natural Science Foundation of China(31971865)Zhejiang Natural Science Foundation(LZ17C130001)+1 种基金the Innovation Method Project of China(2018IM0301002)the Jiangsu Collaborative Innovation Center for Modern Crop Production。
文摘Rice is one of cereal crops and a model species for monocots.Since the release of the first draft rice genome sequences in 2002,considerable progress has been achieved in rice genomic researches,thanks to rapid development and efficient utilization of bioinformatics methods and tools.In this review,we summarize the progress of studies of rice genome sequencing and other omics and introduce the wellmaintained bioinformatics databases and tools developed for rice genome resources and breeding.After reviewing the history of rice bioinformatics,we use single-cell sequencing and machine learning as examples showing how bioinformatics integrates emerging technologies and how it continues to develop for future rice research.
文摘Objective:To surveill emerging variants by nanopore technology-based genome sequencing in different COVID-19 waves in Sri Lanka and to examine the association with the sample characteristics,and vaccination status.Methods:The study analyzed 207 RNA positive swab samples received to sequence laboratory during different waves.The N gene cut-off threshold of less than 30 was considered as the major inclusion criteria.Viral RNA was extracted,and elutes were subjected to nanopore sequencing.All the sequencing data were uploaded in the publicly accessible database,GISAID.Results:The Omicron,Delta and Alpha variants accounted for 58%,22%and 4%of the variants throughout the period.Less than 1%were Kappa variant and 16%of the study samples remained unassigned.Omicron variant was circulated among all age groups and in all the provinces.Ct value and variants assigned percentage was 100%in Ct values of 10-15 while only 45%assigned Ct value over 25.Conclusions:The present study examined the emergence,prevalence,and distribution of SARS-CoV-2 variants locally and has shown that nanopore technology-based genome sequencing enables whole genome sequencing in a low resource setting country.
文摘Precision medicine is transforming psychiatric treatment by tailoring personalized healthcare interventions based on clinical,genetic,environmental,and lifestyle factors to optimize medication management.This study investigates how artificial intelligence(AI)and machine learning(ML)can address key challenges in integrating pharmacogenomics(PGx)into psychiatric care.In this integration,AI analyzes vast genomic datasets to identify genetic markers linked to psychiatric conditions.AI-driven models integrating genomic,clinical,and demographic data demonstrated high accuracy in predicting treatment outcomes for major depressive disorder and bipolar disorder.This study also examines the pressing challenges and provides strategic directions for integrating AI and ML in genomic psychiatry,highlighting the importance of ethical considerations and the need for personalized treatment.Effective implementation of AI-driven clinical decision support systems within electronic health records is crucial for translating PGx into routine psychiatric care.Future research should focus on developing enhanced AI-driven predictive models,privacy-preserving data exchange,and robust informatics systems to optimize patient outcomes and advance precision medicine in psychiatry.
基金supported by the “Conselho Nacional de Desenvolvimento Científico e Tecnológico”(CNPq)-449,564/2014–2supported by a fellowship from CNPq.NBS received a Post-Doctoral fellowship from CAPES/PNPDsupported by the Sao Paulo Research Foundation(Fapesp)fellowship(2015/08939-0 and 2013/13972-0)
文摘Background: Beef cattle breeding programs in Brazil have placed greater emphasis on the genomic study of reproductive traits of males and females due to their economic importance. In this study, genome-wide associations were assessed for scrotal circumference at 210 d of age, scrotal circumference at 420 d of age, age at first calving, and age at second calving, in Canchim beef cattle. Data quality control was conducted resulting in 672,778 SNPs and 392 animals.Results: Associated SNPs were observed for scrotal circumference at 420 d of age(435 SNPs), followed by scrotal circumference at 210 d of age(12 SNPs), age at first calving(six SNPs), and age at second calving(four SNPs). We investigated whether significant SNPs were within genic or surrounding regions. Biological processes of genes were associated with immune system, multicellular organismal process, response to stimulus, apoptotic process, cellular component organization or biogenesis, biological adhesion, and reproduction.Conclusions: Few associations were observed for scrotal circumference at 210 d of age, age at first calving, and age at second calving, reinforcing their polygenic inheritance and the complexity of understanding the genetic architecture of reproductive traits. Finding many associations for scrotal circumference at 420 d of age in various regions of the Canchim genome also reveals the difficulty of targeting specific candidate genes that could act on fertility; nonetheless,the high linkage disequilibrium between loci herein estimated could aid to overcome this issue. Therefore, al relevant information about genomic regions influencing reproductive traits may contribute to target candidate genes for further investigation of causal mutations and aid in future genomic studies in Canchim cattle to improve the breeding program.
基金supported by the Strategic Priority Research Program of the Chinese Academy of Sciences,Stem Cell and Regenerative Medicine Research(Grant No.XDA01040405)the National High-tech R&D Program of China(863Program,2012AA022502)+1 种基金the National‘‘Twelfth FiveYear’’Plan for Science&Technology Support of China(2013BAI01B09) awarded to XFthe National Natural Science Foundation of China(Grant No.31471236)awarded to YL
文摘Publicly-accessible resources have promoted the advance of scientific discovery. The era of genomics and big data has brought the need for collaboration and data sharing in order to make effective use of this new knowledge. Here, we describe the web resources for cancer genomics research and rate them on the basis of the diversity of cancer types, sample size, omics data comprehensiveness, and user experience. The resources reviewed include data repository and analysis tools; and we hope such introduction will promote the awareness and facilitate the usage of these resources in the cancer research community.
基金supported by Strategic Priority Research Program of the Chinese Academy of Sciences[XDB38030201,XDB38030400,XDB38050300]Youth Innovation Promotion Association of Chinese Academy of Sciences[2019104]。
文摘Genome data of severe acute respiratory syndrome coronavirus 2(SARS-CoV-2)is essential for virus diagnosis,vaccine development,and variant surveillance.To archive and integrate worldwide SARS-CoV-2 genome data,a series of resources have been constructed,serving as a fundamental infrastructure for SARS-CoV-2 research,pandemic prevention and control,and coronavirus disease 2019(COVID-19)therapy.Here we present an over-view of extant SARS-CoV-2 resources that are devoted to genome data deposition and integration.We review deposition resources in data accessibility,metadata standardization,data curation and annotation;review integrative resources in data source,de-redundancy processing,data curation and quality assessment,and variant annotation.Moreover,we address issues that impede SARS-CoV-2 genome data integration,including low-complexity,inconsistency and absence of isolate name,sequence inconsistency,asynchronous update of genome data,and mismatched metadata.We finally provide insights into data standardization consensus and data submission guidelines,to promote SARS-CoV-2 genome data sharing and integration.
基金supported by the National Natural Science Foundation of China (31570155 and 31370199)"Young Top-notch Talents" of the Guangdong Province Special Support Program (2014)+3 种基金the Excellent Young Teacher Training Plan of Guangdong Province (Yq2013039)the Guangzhou Healthcare Collaborative Innovation Major Project (201400000002)funded by the China Scholarship Council (CSC No. 201508440056) as a Visiting Scholar (2015-2016)supported by a summer research grant to D.S. from the Office of the Vice President for Research at George Mason University
文摘Restriction endonuclease analysis(REA),or restriction fragment length polymorphism(RFLP),was useful for identifying and determining the relatedness and putative identities of microbial strains(Tang et al.,1997)and for characterizing and discriminating large numbers of samples inexpensively in the past。