期刊文献+
共找到15篇文章
< 1 >
每页显示 20 50 100
Breed identification using breed‑informative SNPs and machine learning based on whole genome sequence data and SNP chip data
1
作者 Changheng Zhao Dan Wang +4 位作者 Jun Teng Cheng Yang Xinyi Zhang Xianming Wei Qin Zhang 《Journal of Animal Science and Biotechnology》 SCIE CAS CSCD 2023年第5期1941-1953,共13页
Background Breed identification is useful in a variety of biological contexts.Breed identification usually involves two stages,i.e.,detection of breed-informative SNPs and breed assignment.For both stages,there are se... Background Breed identification is useful in a variety of biological contexts.Breed identification usually involves two stages,i.e.,detection of breed-informative SNPs and breed assignment.For both stages,there are several methods proposed.However,what is the optimal combination of these methods remain unclear.In this study,using the whole genome sequence data available for 13 cattle breeds from Run 8 of the 1,000 Bull Genomes Project,we compared the combinations of three methods(Delta,FST,and In)for breed-informative SNP detection and five machine learning methods(KNN,SVM,RF,NB,and ANN)for breed assignment with respect to different reference population sizes and difference numbers of most breed-informative SNPs.In addition,we evaluated the accuracy of breed identification using SNP chip data of different densities.Results We found that all combinations performed quite well with identification accuracies over 95%in all scenarios.However,there was no combination which performed the best and robust across all scenarios.We proposed to inte-grate the three breed-informative detection methods,named DFI,and integrate the three machine learning methods,KNN,SVM,and RF,named KSR.We found that the combination of these two integrated methods outperformed the other combinations with accuracies over 99%in most cases and was very robust in all scenarios.The accuracies from using SNP chip data were only slightly lower than that from using sequence data in most cases.Conclusions The current study showed that the combination of DFI and KSR was the optimal strategy.Using sequence data resulted in higher accuracies than using chip data in most cases.However,the differences were gener-ally small.In view of the cost of genotyping,using chip data is also a good option for breed identification. 展开更多
关键词 Breed identification Breed-informative SNPs Genomic breed composition Machine learning Whole genome sequence data
下载PDF
Incorporating genomic annotation into single-step genomic prediction with imputed whole-genome sequence data 被引量:2
2
作者 TENG Jin-yan YE Shao-pan +8 位作者 GAO Ning CHEN Zi-tao DIAO Shu-qi LI Xiu-jin YUAN Xiao-long ZHANG Hao LI Jia-qi ZHANG Xi-quan ZHANG Zhe 《Journal of Integrative Agriculture》 SCIE CAS CSCD 2022年第4期1126-1136,共11页
Single-step genomic best linear unbiased prediction(ss GBLUP) is now intensively investigated and widely used in livestock breeding due to its beneficial feature of combining information from both genotyped and ungeno... Single-step genomic best linear unbiased prediction(ss GBLUP) is now intensively investigated and widely used in livestock breeding due to its beneficial feature of combining information from both genotyped and ungenotyped individuals in the single model. With the increasing accessibility of whole-genome sequence(WGS) data at the population level, more attention is being paid to the usage of WGS data in ss GBLUP. The predictive ability of ss GBLUP using WGS data might be improved by incorporating biological knowledge from public databases. Thus, we extended ss GBLUP, incorporated genomic annotation information into the model, and evaluated them using a yellow-feathered chicken population as the examples. The chicken population consisted of 1 338 birds with 23 traits, where imputed WGS data including 5 127 612 single nucleotide polymorphisms(SNPs) are available for 895 birds. Considering different combinations of annotation information and models, original ss GBLUP, haplotype-based ss GHBLUP, and four extended ss GBLUP incorporating genomic annotation models were evaluated. Based on the genomic annotation(GRCg6a) of chickens, 3 155 524 and 94 837 SNPs were mapped to genic and exonic regions, respectively. Extended ss GBLUP using genic/exonic SNPs outperformed other models with respect to predictive ability in 15 out of 23 traits, and their advantages ranged from 2.5 to 6.1% compared with original ss GBLUP. In addition, to further enhance the performance of genomic prediction with imputed WGS data, we investigated the genotyping strategies of reference population on ss GBLUP in the chicken population. Comparing two strategies of individual selection for genotyping in the reference population, the strategy of evenly selection by family(SBF) performed slightly better than random selection in most situations. Overall, we extended genomic prediction models that can comprehensively utilize WGS data and genomic annotation information in the framework of ss GBLUP, and validated the idea that properly handling the genomic annotation information and WGS data increased the predictive ability of ss GBLUP. Moreover, while using WGS data, the genotyping strategy of maximizing the expected genetic relationship between the reference and candidate population could further improve the predictive ability of ss GBLUP. The results from this study shed light on the comprehensive usage of genomic annotation information in WGS-based single-step genomic prediction. 展开更多
关键词 genomic selection prior information sequencing data genotype imputation HAPLOTYPE
下载PDF
Analysis on the Influence of Automatic Station Temperature Data on the Sequence Continuity of Historical Meteorological Data 被引量:1
3
作者 CHEN Ming1, GAI Xiao-bo2, FAN Xin-yu1, SONG Min1 1. Jinzhou Meteorology Bureau in Liaoning Province, Jinzhou 121001, China 2. Dalian Meteorological Bureau in Liaoning Province, Dalian 116001, China 《Meteorological and Environmental Research》 CAS 2011年第4期12-14,17,共4页
[Objective] The research aimed to study the influence of automatic station data on the sequence continuity of historical meteorological data. [Method] Based on the temperature data which were measured by the automatic... [Objective] The research aimed to study the influence of automatic station data on the sequence continuity of historical meteorological data. [Method] Based on the temperature data which were measured by the automatic meteorological station and the corresponding artificial observation data during January-December in 2001, the monthly average, maximum and minimum temperatures in the automatic station were compared with the corresponding artificial observation temperature data in the parallel observation period by using the contrast difference and the standard deviation of difference value. The difference between the automatic station and the artificial data, the variation characteristics were understood. Meanwhile, the significance test and analysis of annual average value were carried out by the data sequence during 1990-2009. The influence of automatic station replacing the artificial observation on the sequence continuity of historical temperature data was discussed. [Result] Although the two temperature data in the parallel observation period had the certain difference, the difference was in the permitted range of automatic station difference value on average. The difference of individual month surpassed the permitted range of automatic station difference value. The significance test showed that the annual average temperature and the annual average minimum temperature which were observed in the automatic station had the difference with the historical data. It had the certain influence on the annual temperature sequence, but the difference wasn’t significant as a whole. When the automatic observation combined with the artificial observation to use, the sequence needed carry out the homogeneous test and correction. [Conclusion] The research played the important role on guaranteeing the monorail running of automatic station, optimizing the meteorological surface observation system, improving the climate sequence continuity of meteorological element and the reliability of climate statistics. 展开更多
关键词 Automatic observation Artificial observation data sequence ANALYSIS China
下载PDF
Logging Data High-Resolution Sequence Stratigraphy
4
作者 李洪奇 谢寅符 +1 位作者 孙中春 罗兴平 《Journal of China University of Geosciences》 SCIE CSCD 2006年第2期173-180,共8页
The recognition and contrast of bed sets in parasequence is difficult in terrestrial basin high-resolution sequence stratigraphy. This study puts forward new methods for the boundary identification and contrast of bed... The recognition and contrast of bed sets in parasequence is difficult in terrestrial basin high-resolution sequence stratigraphy. This study puts forward new methods for the boundary identification and contrast of bed sets on the basis of manifold logging data. The formation of calcareous interbeds, shale resistivity differences and the relation of reservoir resistivity to altitude are considered on the basis of log curve morphological characteristics, core observation, cast thin section, X-ray diffraction and scanning electron microscopy. The results show that the thickness of calcareous interbeds is between 0.5 m and 2 m, increasing on weathering crusts and faults. Calcareous interbeds occur at the bottom of a distributary channel and the top of a distributary mouth bar. Lower resistivity shale (4-5 Ω · m) and higher resistivity shale (〉 10Ω·m) reflect differences in sediment fountain or sediment microfacies. Reservoir resistivity increases with altitude. Calcareous interbeds may be a symbol of recognition for the boundary of bed sets and isochronous contrast bed sets, and shale resistivity differences may confirm the stack relation and connectivity of bed sets. Based on this, a high-resolution chronostratigraphic frame- work of Xi-1 segment in Shinan area, Junggar basin is presented, and the connectivity of bed sets and oil-water contact is confirmed. In this chronostratigraphic framework, the growth order, stack mode and space shape of bed sets are qualitatively and quantitatively described. 展开更多
关键词 Junggar basin logging data sequence stratigraphy calcareous interbeds shale resistivity relationship of resistivity to altitude reservoir connectivity.
下载PDF
Examining heterogeneity of stromal cells in tumor microenvironment based on pan-cancer single-cell RNA sequencing data
5
作者 Wenhui Wang Li Wang +1 位作者 Junjun She Jun Zhu 《Cancer Biology & Medicine》 SCIE CAS CSCD 2022年第1期30-42,共13页
Tumor tissues contain both tumor and non-tumor cells,which include infiltrated immune cells and stromal cells,collectively called the tumor microenvironment(TME).Single-cell RNA sequencing(sc RNAseq)enables the examin... Tumor tissues contain both tumor and non-tumor cells,which include infiltrated immune cells and stromal cells,collectively called the tumor microenvironment(TME).Single-cell RNA sequencing(sc RNAseq)enables the examination of heterogeneity of tumor cells and TME.In this review,we examined sc RNAseq datasets for multiple cancer types and evaluated the heterogeneity of major cell type composition in different cancer types.We further showed that endothelial cells and fibroblasts/myofibroblasts in different cancer types can be classified into common subtypes,and the subtype composition is clearly associated with cancer characteristic and therapy response. 展开更多
关键词 Stromal cells tumor microenvironment pan-cancer single-cell RNA sequencing data
下载PDF
Comparing the transmission potential from sequence and surveillance data of 2009 North American influenza pandemic waves
6
作者 Venkata R.Duvvuri Joseph T.Hicks +6 位作者 Lambodhar Damodaran Martin Grunnill Thomas Braukmann Jianhong Wu Jonathan B.Gubbay Samir N.Patel Justin Bahl 《Infectious Disease Modelling》 CSCD 2023年第1期240-252,共13页
Technological advancements in phylodynamic modeling coupled with the accessibility of real-time pathogen genetic data are increasingly important for understanding the infectious disease transmission dynamics.In this s... Technological advancements in phylodynamic modeling coupled with the accessibility of real-time pathogen genetic data are increasingly important for understanding the infectious disease transmission dynamics.In this study,we compare the transmission potentials of North American influenza A(H1N1)pdm09 derived from sequence data to that derived from surveillance data.The impact of the choice of tree-priors,informative epidemiological priors,and evolutionary parameters on the transmission potential estimation is evaluated.North American Influenza A(H1N1)pdm09 hemagglutinin(HA)gene sequences are analyzed using the coalescent and birth-death tree prior models to estimate the basic reproduction number(R_(0)).Epidemiological priors gathered from published literature are used to simulate the birth-death skyline models.Path-sampling marginal likelihood estimation is conducted to assess model fit.A bibliographic search to gather surveillancebased R_(0)values were consistently lower(mean≤1.2)when estimated by coalescent models than by the birth-death models with informative priors on the duration of infectiousness(mean≥1.3 to≤2.88 days).The user-defined informative priors for use in the birth-death model shift the directionality of epidemiological and evolutionary parameters compared to non-informative estimates.While there was no certain impact of clock rate and tree height on the R_(0)estimation,an opposite relationship was observed between coalescent and birth-death tree priors.There was no significant difference(p=0.46)between the birth-death model and surveillance R0 estimates.This study concludes that treeprior methodological differences may have a substantial impact on the transmission potential estimation as well as the evolutionary parameters.The study also reports a consensus between the sequence-based R_(0)estimation and surveillanceased R_(0)stimates.Altogether,these outcomes shed light on the potential role of phylodynamic modeling to augment existing surveillance and epidemiological activities to better assess and respond to emerging infectious diseases. 展开更多
关键词 Phylodynamics Pandemic 2009 H1N1 Reproduction number Coalescent growth models Birth-death models Pathogen sequence data Public health
原文传递
Transferable Features from 1D-Convolutional Network for Industrial Malware Classification
7
作者 LiweiWang Jiankun Sun +1 位作者 Xiong Luo Xi Yang 《Computer Modeling in Engineering & Sciences》 SCIE EI 2022年第2期1003-1016,共14页
With the development of information technology,malware threats to the industrial system have become an emergent issue,since various industrial infrastructures have been deeply integrated into our modern works and live... With the development of information technology,malware threats to the industrial system have become an emergent issue,since various industrial infrastructures have been deeply integrated into our modern works and lives.To identify and classify new malware variants,different types of deep learning models have been widely explored recently.Generally,sufficient data is usually required to achieve a well-trained deep learning classifier with satisfactory generalization ability.However,in current practical applications,an ample supply of data is absent in most specific industrial malware detection scenarios.Transfer learning as an effective approach can be used to alleviate the influence of the small sample size problem.In addition,it can also reuse the knowledge from pretrained models,which is beneficial to the real-time requirement in industrial malware detection.In this paper,we investigate the transferable features learned by a 1D-convolutional network and evaluate our proposed methods on 6 transfer learning tasks.The experiment results show that 1D-convolutional architecture is effective to learn transferable features for malware classification,and indicate that transferring the first 2 layers of our proposed 1D-convolutional network is the most efficient way to reuse the learned features. 展开更多
关键词 Transfer learning malware classification sequence data modeling convolutional network
下载PDF
Towards Sensor-free Academic Emotion Prediction in Programming Environment
8
作者 Tao Lin Zhiming Wu +2 位作者 Juan Zheng Shenggen Ju Yu Fu 《计算机教育》 2020年第12期77-84,共8页
he transition from traditional learning to practice-oriented programming learning will bring learners discomfort.The discomfort quickly breeds negative emotions when encountering programming difficulties,which leads t... he transition from traditional learning to practice-oriented programming learning will bring learners discomfort.The discomfort quickly breeds negative emotions when encountering programming difficulties,which leads the learner to lose interest in programming or even give up.Emotion plays a crucial role in learning.Educational psychology research shows that positive emotion can promote learning performance,increase learning interest and cultivate creative thinking.Accurate recognition and interpretation of programming learners’emotions can give them feedback in time,and adjust teaching strategies accurately and individually,which is of considerable significance to improve effects of programming learning and education.The existing methods of sensor-free emotion prediction include emotion prediction based on keyboard dynamic,mouse interaction data and interaction logs,respectively.However,none of the three studies considered the temporal characteristics of emotion,resulting in low recognition accuracy.For the first time,this paper proposes an emotion prediction model based on time series and context information.Then,we establish a Bi-recurrent neural network,obtain the time sequence characteristics of data automatically,and explore the application of deep learning in the field of Academic Emotion prediction.The results show that the classification ability of this model is much better than that of the original LSTM(Long-Short Term Memory),GRU(Gate Recurrent Unit)and RNN(Re-current Neural Network),and this model has better generalization ability. 展开更多
关键词 emotion prediction emotional state programming behavior data Bi-directional Recurrent Neural Network interaction sequence data
下载PDF
Study on generation in grey system theory 被引量:3
9
作者 PingXueliang ZhouRurong LiuShenglan 《Journal of Systems Engineering and Electronics》 SCIE EI CSCD 2005年第2期325-329,共5页
Grey sequence generation can draw out and develop implied rules of the original data. Different kinds of generation methods were summarized and classified into two types: partial generation and whole generation. The a... Grey sequence generation can draw out and develop implied rules of the original data. Different kinds of generation methods were summarized and classified into two types: partial generation and whole generation. The average generation and stepwise ratio generation is disussed , the preference generation is regard as a special case of proportional division based on analysis geometric theory, propose an idea of using concave and convex status of discrete data to determine the generation coefficient. Based on the stepwise and smooth ratio generation, a tendency average generation is proposed and have a comparison using the data provided in papers listed in the references. The comparison proves that the new generation is better than the other two generations and errors are obviously reduced. 展开更多
关键词 data sequence GENERATION grey system tendency average generation.
下载PDF
High-throughput Sequencing Technology and Its Application 被引量:9
10
作者 Zhu Qiang-long Liu Shi +1 位作者 Gao Peng Luan Fei-shi 《Journal of Northeast Agricultural University(English Edition)》 CAS 2014年第3期84-96,共13页
Gene sequencing is a great way to interpret life, and high-throughput sequencing technology is a revolutionary technological innovation in gene sequencing researches. This technology is characterized by low cost and h... Gene sequencing is a great way to interpret life, and high-throughput sequencing technology is a revolutionary technological innovation in gene sequencing researches. This technology is characterized by low cost and high-throughput data. Currently, high-throughput sequencing technology has been widely applied in multi-level researches on genomics, transcriptomics and epigenomics. And it has fundamentally changed the way we approach problems in basic and translational researches and created many new possibilities. This paper presented a general description of high-throughput sequencing technology and a comprehensive review of its application with plain, concisely and precisely. In order to help researchers finish their work faster and better, promote science amateurs and understand it easier and better. 展开更多
关键词 high-throughput sequencing data analysis genome sequence transcriptome sequence bioinformatics
下载PDF
GSA:Genome Sequence Archive 被引量:16
11
作者 Yanqing Wang Fuhai Song +20 位作者 Junwei Zhu Sisi Zhang Yadong Yang Tingting Chen Bixia Tang Lili Dong Nan Ding Qian Zhang Zhouxian Bai Xunong Dong Huanxin Chen Mingyuan Sun Shuang Zhai Yubin Sun Lei Yu Li Lan Jingfa Xiao Xiangdong Fang Hongxing Lei Zhang Zhang Wenming Zhao 《Genomics, Proteomics & Bioinformatics》 SCIE CAS CSCD 2017年第1期14-18,共5页
With the rapid development of sequencing technologies towards higher throughput and lower cost, sequence data are generated at an unprecedentedly explosive rate. To provide an efficient and easy-to-use platform for ma... With the rapid development of sequencing technologies towards higher throughput and lower cost, sequence data are generated at an unprecedentedly explosive rate. To provide an efficient and easy-to-use platform for managing huge sequence data, here we present Genome Sequence Archive (GSA; http://bigd.big.ac.cn/gsa or http://gsa.big.ac.cn), a data repository for archiving raw sequence data. In compliance with data standards and structures of the International Nucleotide Sequence Database Collaboration (INSDC), GSA adopts four data objects (BioProject, BioSample, Experiment, and Run) for data organization, accepts raw sequence reads produced by a variety of sequencing platforms, stores both sequence reads and metadata submitted from all over the world, and makes all these data publicly available to worldwide scientific communities. In the era of big data, GSA is not only an important complement to existing INSDC members by alleviating the increasing burdens of handling sequence data deluge, but also takes the significant responsibility for global big data archive and provides free unrestricted access to all publicly available data in support of research activities throughout the world. 展开更多
关键词 Genome sequence Archive GSA Big data Raw sequence data INSDC
原文传递
Comparative visual analytics for assessing medical records with sequence embedding 被引量:1
12
作者 Rongchen Guo Takanori Fujiwara +4 位作者 Yiran Li Kelly M.Lima Soman Sen Nam K.Tran Kwan-Liu Ma 《Visual Informatics》 EI 2020年第2期72-85,共14页
Machine learning for data-driven diagnosis has been actively studied in medicine to provide better healthcare.Supporting analysis of a patient cohort similar to a patient under treatment is a key task for clinicians t... Machine learning for data-driven diagnosis has been actively studied in medicine to provide better healthcare.Supporting analysis of a patient cohort similar to a patient under treatment is a key task for clinicians to make decisions with high confidence.However,such analysis is not straightforward due to the characteristics of medical records:high dimensionality,irregularity in time,and sparsity.To address this challenge,we introduce a method for similarity calculation of medical records.Our method employs event and sequence embeddings.While we use an autoencoder for the event embedding,we apply its variant with the self-attention mechanism for the sequence embedding.Moreover,in order to better handle the irregularity of data,we enhance the self-attention mechanism with consideration of different time intervals.We have developed a visual analytics system to support comparative studies of patient records.To make a comparison of sequences with different lengths easier,our system incorporates a sequence alignment method.Through its interactive interface,the user can quickly identify patients of interest and conveniently review both the temporal and multivariate aspects of the patient records.We demonstrate the effectiveness of our design and system with case studies using a real-world dataset from the neonatal intensive care unit of UC Davis. 展开更多
关键词 Electronic medical records Event sequence data Autoencoder Self-attention sequence similarity Visual analytics
原文传递
eTumorMetastasis:A Network-based Algorithm Predicts Clinical Outcomes Using Whole-exome Sequencing Data of Cancer Patients
13
作者 Jean-Sébastien Milanese Chabane Tibiche +6 位作者 Naif Zaman Jinfeng Zou Pengyong Han Zhigang Meng Andre Nantel Arnaud Droit Edwin Wang 《Genomics, Proteomics & Bioinformatics》 SCIE CAS CSCD 2021年第6期973-985,共13页
Continual reduction in sequencing cost is expanding the accessibility of genome sequencing data for routine clinical applications.However,the lack of methods to construct machine learning-based predictive models using... Continual reduction in sequencing cost is expanding the accessibility of genome sequencing data for routine clinical applications.However,the lack of methods to construct machine learning-based predictive models using these datasets has become a crucial bottleneck for the application of sequencing technology in clinics.Here,we develop a new algorithm,eTumorMetastasis,which transforms tumor functional mutations into network-based profiles and identifies network operational gene(NOG)signatures.NOG signatures model the tipping point at which a tumor cell shifts from a state that doesn’t favor recurrence to one that does.We show that NOG signatures derived from genomic mutations of tumor founding clones(i.e.,the‘most recent common ancestor’of the cells within a tumor)significantly distinguish the recurred and non-recurred breast tumors as well as outperform the most popular genomic test(i.e.,Oncotype DX).These results imply that mutations of the tumor founding clones are associated with tumor recurrence and can be used to predict clinical outcomes.As such,predictive tools could be used in clinics to guide treatment routes.Finally,the concepts underlying the eTumorMetastasis pave the way for the application of genome sequencing in predictions for other complex genetic diseases.eTumorMetastasis pseudocode and related data used in this study are available at https://github.com/WangEdwinLab/eTumorMetastasis. 展开更多
关键词 Breast cancer Sequencing data Predictive model Systems biology Machine learning
原文传递
Cytokine storm promoting T cell exhaustion in severe COVID-19 revealed by single cell sequencing data analysis
14
作者 Minglei Yang Chenghao Lin +4 位作者 Yanni Wang Kang Chen Yutong Han Haiyue Zhang Weizhong Li 《Precision Clinical Medicine》 2022年第2期87-99,共13页
Background:Evidence has suggested that cytokine storms may be associated with T cell exhaustion(TEX)in COVID-19.However,the interaction mechanism between cytokine storms and TEX remains unclear.Methods:With the aim of... Background:Evidence has suggested that cytokine storms may be associated with T cell exhaustion(TEX)in COVID-19.However,the interaction mechanism between cytokine storms and TEX remains unclear.Methods:With the aim of dissecting the molecular relationship of cytokine storms and TEX through single-cell RNA sequencing data analysis,we identified 14 cell types from bronchoalveolar lavage fluid of COVID-19 patients and healthy people.We observed a novel subset of severely exhausted CD8 T cells(Exh T_CD8)that co-expressed multiple inhibitory receptors,and two macrophage subclasses that were the main source of cytokine storms in bronchoalveolar.Results:Correlation analysis between cytokine storm level and TEX level suggested that cytokine storms likely promoted TEX in severe COVID-19.Cell–cell communication analysis indicated that cytokines(e.g.CXCL10,CXCL11,CXCL2,CCL2,and CCL3)released by macrophages acted as ligands and significantly interacted with inhibitory receptors(e.g.CXCR3,DPP4,CCR1,CCR2,and CCR5)expressed by Exh T_CD8.These interactions formed the cytokine–receptor axes,which were also verified to be significantly correlated with cytokine storms and TEX in lung squamous cell carcinoma.Conclusions:Cytokine storms may promote TEX through cytokine-receptor axes and be associated with poor prognosis in COVID19.Blocking cytokine-receptor axes may reverse TEX.Our finding provides novel insights into TEX in COVID-19 and new clues for cytokine-targeted immunotherapy development. 展开更多
关键词 COVID-19 immune exhaustion cytokine storm single-cell sequencing data analysis T cell immune checkpoint
原文传递
Comprehensive simulation of metagenomic sequencing data with non-uniform sampling distribution
15
作者 Shansong Liu Kui Hua +1 位作者 Sijie Chen Xuegong Zhang 《Frontiers of Electrical and Electronic Engineering in China》 CSCD 2018年第2期175-185,共11页
Background: Metagenomic sequencing is a complex sampling procedure from unknown mixtures of many genomes. Having metagenome data with known genome compositions is essential for both benchmarking bioinformatics softwa... Background: Metagenomic sequencing is a complex sampling procedure from unknown mixtures of many genomes. Having metagenome data with known genome compositions is essential for both benchmarking bioinformatics software and for investigating influences of various factors on the data. Compared to data from real microbiome samples or from defined microbial mock community, simulated data with proper computational models are better for the purpose as they provide more flexibility for controlling multiple factors. Methods: We developed a non-uniform metagenomic sequencing simulation system (nuMetaSim) that is capable of mimicking various factors in real metagenomic sequencing to reflect multiple properties of real data with customizable parameter settings. Results: We generated 9 comprehensive metagenomic datasets with different composition complexity from of 203 bacterial genomes and 2 archaeal genomes related with human intestine system. Conclusion: The data can serve as benchmarks for comparing performance of different methods at different situations, and the software package allows users to generate simulation data that can better reflect the specific properties in their scenarios. 展开更多
关键词 SIMULATION metagenomic sequencing data non-uniform sampling nuMetaSim
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部