Microsatellite instability(MSI)is a key biomarker for cancer therapy and prognosis.Traditional experimental assays are laborious and time-consuming,and next-generation sequencingbased computational methods do not work...Microsatellite instability(MSI)is a key biomarker for cancer therapy and prognosis.Traditional experimental assays are laborious and time-consuming,and next-generation sequencingbased computational methods do not work on leukemia samples,paraffin-embedded samples,or patient-derived xenografts/organoids,due to the requirement of matched normal samples.Herein,we developed MSIsensor-pro,an open-source single sample MSI scoring method for research and clinical applications.MSIsensor-pro introduces a multinomial distribution model to quantify polymerase slippages for each tumor sample and a discriminative site selection method to enable MSI detection without matched normal samples.We demonstrate that MSIsensor-pro is an ultrafast,accurate,and robust MSI calling method.Using samples with various sequencing depths and tumor purities,MSIsensor-pro significantly outperformed the current leading methods in both accuracy and computational cost.MSIsensor-pro is available at https://github.com/xjtu-omics/msisensor-pro and free for non-commercial use,while a commercial license is provided upon request.展开更多
Complex structural variants(CSVs) are genomic alterations that have more than two breakpoints and are considered as the simultaneous occurrence of simple structural variants.However,detecting the compounded mutational...Complex structural variants(CSVs) are genomic alterations that have more than two breakpoints and are considered as the simultaneous occurrence of simple structural variants.However,detecting the compounded mutational signals of CSVs is challenging through a commonly used model-match strategy.As a result,there has been limited progress for CSV discovery compared with simple structural variants.Here,we systematically analyzed the multi-breakpoint connection feature of CSVs,and proposed Mako,utilizing a bottom-up guided model-free strategy,to detect CSVs from paired-end short-read sequencing.Specifically,we implemented a graph-based pattern growth approach,where the graph depicts potential breakpoint connections,and pattern growth enables CSV detection without pre-defined models.Comprehensive evaluations on both simulated and real datasets revealed that Mako outperformed other algorithms.Notably,validation rates of CSVs on real data based on experimental and computational validations as well as manual inspections are around 70%,where the medians of experimental and computational breakpoint shift are 13 bp and 26 bp,respectively.Moreover,the Mako CSV subgraph effectively characterized the breakpoint connections of a CSV event and uncovered a total of 15 CSV types,including two novel types of adjacent segment swap and tandem dispersed duplication.Further analysis of these CSVs also revealed the impact of sequence homology on the formation of CSVs.Mako is publicly available at https://github.com/xjtu-omics/Mako.展开更多
基金supported by the National Key R&D Program of China(Grant Nos.2018YFC0910400 and 2017YFC0907500)the National Natural Science Foundation of China(Grant Nos.31671372,61702406,31701739,and 31970317)+2 种基金the National Science and Technology Major Project of China(Grant No.2018ZX10302205)the‘‘World-Class Universities and the Characteristic Development Guidance Funds for the Central Universities”the General Financial Grant from the China Postdoctoral Science Foundation(Grant Nos.2017M623178 and 2017M623188)
文摘Microsatellite instability(MSI)is a key biomarker for cancer therapy and prognosis.Traditional experimental assays are laborious and time-consuming,and next-generation sequencingbased computational methods do not work on leukemia samples,paraffin-embedded samples,or patient-derived xenografts/organoids,due to the requirement of matched normal samples.Herein,we developed MSIsensor-pro,an open-source single sample MSI scoring method for research and clinical applications.MSIsensor-pro introduces a multinomial distribution model to quantify polymerase slippages for each tumor sample and a discriminative site selection method to enable MSI detection without matched normal samples.We demonstrate that MSIsensor-pro is an ultrafast,accurate,and robust MSI calling method.Using samples with various sequencing depths and tumor purities,MSIsensor-pro significantly outperformed the current leading methods in both accuracy and computational cost.MSIsensor-pro is available at https://github.com/xjtu-omics/msisensor-pro and free for non-commercial use,while a commercial license is provided upon request.
基金supported by the National Key R&D Program of China(Grant Nos.2018YFC0910400 and 2017YFC0907500)the National Science Foundation of China(Grant Nos.31671372,61702406,and 31701739)+3 种基金the Fundamental Research Funds for the Central Universitiesthe World-Class Universities(Disciplines)the Characteristic Development Guidance Funds for the Central Universitiesthe Shanghai Municipal Science and Technology Major Project(Grant No.2017SHZDZX01)。
文摘Complex structural variants(CSVs) are genomic alterations that have more than two breakpoints and are considered as the simultaneous occurrence of simple structural variants.However,detecting the compounded mutational signals of CSVs is challenging through a commonly used model-match strategy.As a result,there has been limited progress for CSV discovery compared with simple structural variants.Here,we systematically analyzed the multi-breakpoint connection feature of CSVs,and proposed Mako,utilizing a bottom-up guided model-free strategy,to detect CSVs from paired-end short-read sequencing.Specifically,we implemented a graph-based pattern growth approach,where the graph depicts potential breakpoint connections,and pattern growth enables CSV detection without pre-defined models.Comprehensive evaluations on both simulated and real datasets revealed that Mako outperformed other algorithms.Notably,validation rates of CSVs on real data based on experimental and computational validations as well as manual inspections are around 70%,where the medians of experimental and computational breakpoint shift are 13 bp and 26 bp,respectively.Moreover,the Mako CSV subgraph effectively characterized the breakpoint connections of a CSV event and uncovered a total of 15 CSV types,including two novel types of adjacent segment swap and tandem dispersed duplication.Further analysis of these CSVs also revealed the impact of sequence homology on the formation of CSVs.Mako is publicly available at https://github.com/xjtu-omics/Mako.