期刊文献+
共找到5篇文章
< 1 >
每页显示 20 50 100
Identifying viruses from metagenomic data using deep learning 被引量:7
1
作者 Jie Ren Kai Song +6 位作者 Chao Deng Nathan AAhlgren Jed AFuhrman Yi Li Xiaohui Xie Ryan Poplin fengzhu sun 《Quantitative Biology》 CAS CSCD 2020年第1期64-77,共14页
Background:The recent development of metagenomic sequencing makes it possible to massively sequence microbial genomes including viral genomes without the need for laboratory culture.Existing reference-based and gene h... Background:The recent development of metagenomic sequencing makes it possible to massively sequence microbial genomes including viral genomes without the need for laboratory culture.Existing reference-based and gene homology-based methods are not efficient in identifying unknown viruses or short viral sequences from metagenomic data.Methods:Here we developed a reference-free and alignment-free machine learning method,DeepVirFinder,for identifying viral sequences in metagenomic data using deep learning.Results'.Trained based on sequences from viral RefSeq discovered before May 2015,and evaluated on those discovered after that date,DeepVirFinder outperformed the state-of-the-art method VirFinder at all contig lengths,achieving AUROC 0.93,0.95,0.97,and 0.98 for 300,500,1000,and 3000 bp sequences respectively.Enlarging the training data with additional millions of purified viral sequences from metavirome samples further improved the accuracy for identifying virus groups that are under-represented.Applying DeepVirFinder to real human gut metagenomic samples,we identified 51,138 viral sequences belonging to 175 bins in patients with colorectal carcinoma(CRC).Ten bins were found associated with the cancer status,suggesting viruses may play important roles in CRC.Conclusions:Powered by deep learning and high throughput sequencing metagenomic data,DeepVirFinder significantly improved the accuracy of viral identification and will assist the study of viruses in the era of metagenomics. 展开更多
关键词 METAGENOME DEEP LEARNING VIRUS identification MACHINE LEARNING
原文传递
Increasing prediction performance of colorectal cancer disease status using random forests classification based on metagenomic shotgun sequencing data 被引量:1
2
作者 Yilin Gao Zifan Zhu fengzhu sun 《Synthetic and Systems Biotechnology》 SCIE 2022年第1期574-585,共12页
Dysfunction of microbial communities in various human body sites has been shown to be associated with a variety of diseases raising the possibility of predicting diseases based on metagenomic samples.Although many stu... Dysfunction of microbial communities in various human body sites has been shown to be associated with a variety of diseases raising the possibility of predicting diseases based on metagenomic samples.Although many studies have investigated this problem,there are no consensus on the optimal approaches for predicting disease status based on metagenomic samples.Using six human gut metagenomic datasets consisting of large numbers of colorectal cancer patients and healthy controls from different countries,we investigated different software packages for extracting relative abundances of known microbial genomes and for integrating mapping and as-sembly approaches to obtain the relative abundance profiles of both known and novel genomes.The random forests(RF)classification algorithm was then used to predict colorectal cancer status based on the microbial relative abundance profiles.Based on within data cross-validation and cross-dataset prediction,we show that the RF prediction performance using the microbial relative abundance profiles estimated by Centrifuge is generally higher than that using the microbial relative abundance profiles estimated by MetaPhlAn2 and Bracken.We also develop a novel method to integrate the relative abundance profiles of both known and novel microbial or-ganisms to further increase the prediction performance for colorectal cancer from metagenomes. 展开更多
关键词 MICROBIOME Colorectal cancer Metagenomic shotgun sequencing Random forests
原文传递
Confidence intervals for Markov chain transition probabilities based on next generation sequencing reads data
3
作者 Lin Wan Xin Kang +1 位作者 Jie Ren fengzhu sun 《Quantitative Biology》 CAS CSCD 2020年第2期143-154,共12页
Background:Markov chains(MC)have been widely used to model molecular sequences.The estimations of MC transition matrix and confidence intervals of the transition probabilities from long sequence data have been intensi... Background:Markov chains(MC)have been widely used to model molecular sequences.The estimations of MC transition matrix and confidence intervals of the transition probabilities from long sequence data have been intensively studied in the past decades.In next generation sequencing(NGS),a large amount of short reads are generated.These short reads can overlap and some regions of the genome may not be sequenced resulting in a new type of data.Based on NGS data,the transition probabilities of MC can be estimated by moment estimators.However,the classical asymptotic distribution theory for MC transition probability estimators based on long sequences is no longer valid.Methods:In this study,we present the asymptotic distributions of several statistics related to MC based on NGS data.We show that,after scaling by the effective coverage d defined in a previous study by the authors,these statistics based on NGS data approximate to the same distributions as the corresponding statistics for long sequences.Results:We apply the asymptotic properties of these statistics for finding the theoretical confidence regions for MC transition probabilities based on NGS short reads data.We validate our theoretical confidence intervals using both simulated data and real data sets,and compare the results with those by the parametric bootstrap method.Conclusions:We find that the asymptotic distributions of these statistics and the theoretical confidence intervals of transition probabilities based on NGS data given in this study are highly accurate,providing a powerful tool for NGS data analysis. 展开更多
关键词 Markov chains next generation sequencing transition probabilities confidence intervals
原文传递
International Workshop on Applications of Probability and Statistics to Biology,July 11-13,2019--In Honor of Professor Minping Qian’s 80th Birthday
4
作者 Minghua Deng Jianfeng Feng +2 位作者 Hong Qian Lin Wan fengzhu sun 《Quantitative Biology》 CAS CSCD 2020年第2期177-186,共10页
The International Workshop on Applications of Probability and Statistics to Biology(APSB)was successfully held in Shanghai,China,July 11-13,2019.The workshop was hosted by the Institute of Science and Technology for B... The International Workshop on Applications of Probability and Statistics to Biology(APSB)was successfully held in Shanghai,China,July 11-13,2019.The workshop was hosted by the Institute of Science and Technology for Brain-inspired Intelligence(ISTBI)at Fudan University,and in honor of the 80th birthday of Prof.Minping Qian of Peking University.Most of the twenty eight speakers were former students or close collaborators of Prof.Qian;and there were over eighty participants from all over China and United States. 展开更多
关键词 Probability SPEAKERS ping
原文传递
Meeting report on RECOMB 2013 (the 17th Annual International Conference on Research in Computational Molecular Biology)
5
作者 Xuegong Zhang fengzhu sun 《Frontiers of Electrical and Electronic Engineering in China》 2013年第2期175-181,共7页
RECOMB 2013 was successfully held in Tsinghua University, Beijing, China on April 7-10, 2013, hosted by the Bioinformatics Division and Center for Synthetic and Systems Biology, Tsinghua National Laboratory for Inform... RECOMB 2013 was successfully held in Tsinghua University, Beijing, China on April 7-10, 2013, hosted by the Bioinformatics Division and Center for Synthetic and Systems Biology, Tsinghua National Laboratory for Information Science and Technology (TNLIST). A total of about 500 professionals from both academia and industry from 29 countries and regions attended the conference and its RECOMB-Seq satellite workshop after the main conference. 展开更多
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部