An efficient rule-based algorithm is presented for haplotype inference from general pedigree genotype data, with the assumption of no recombination. This algorithm generalizes previous algorithms to handle the cases w...An efficient rule-based algorithm is presented for haplotype inference from general pedigree genotype data, with the assumption of no recombination. This algorithm generalizes previous algorithms to handle the cases where some pedigree founders are not genotyped, provided that for each nuclear family at least one parent is genotyped and each non-genotyped founder appears in exactly one nuclear family. The importance of this generalization lies in that such cases frequently happen in real data, because some founders may have passed away and their genotype data can no longer be collected. The algorithm runs in O(m^3n^3) time, where m is the number of single nucleotide polymorphism (SNP) loci under consideration and n is the number of genotyped members in the pedigree. This zero-recombination haplotyping algorithm is extended to a maximum parsimoniously haplotyping algorithm in one whole genome scan to minimize the total number of breakpoint sites, or equivalently, the number of maximal zero-recombination chromosomal regions. We show that such a whole genome scan haplotyping algorithm can be implemented in O(m^3n^3) time in a novel incremental fashion, here m denotes the total number of SNP loci along the chromosome.展开更多
We investigate the maximum happy vertices(MHV)problem and its complement,the minimum unhappy vertices(MUHV)problem.In order to design better approximation algorithms,we introduce the supermodular and submodular multi-...We investigate the maximum happy vertices(MHV)problem and its complement,the minimum unhappy vertices(MUHV)problem.In order to design better approximation algorithms,we introduce the supermodular and submodular multi-labeling(SUP-ML and SUB-ML)problems and show that MHV and MUHV are special cases of SUP-ML and SUB-ML,respectively,by rewriting the objective functions as set functions.The convex relaxation on the I ovasz extension,originally presented for the submodular multi-partitioning problem,can be extended for the SUB-ML problem,thereby proving that SUB-ML(SUP-ML,respectively)can be approximated within a factorof2-2/k(2/k,respectively),where k is the number of labels.These general results imply that MHV and MUHV can also be approximated within factors of 2/k and 2-2/k,respectively,using the same approximation algorithms.For the MUHV problem,we also show that it is approximation-equivalent to the hypergraph multiway cut problem;thus,MUHV is Unique Games-hard to achieve a(2-2/k-e)-approximation,for anyε>0.For the MHV problem,the 2/k-approximation improves the previous best approximation ratio max{1/k,1/(△+1/g(△)},where△is the maximum vertex degree of the input graph and g(△)=(√△+√△+1)2△>4△2.We also show that an existing LP relaxation for MHV is the same as the concave relaxation on the Lovasz extension for SUP-ML;we then prove an upper bound of 2/k on the integrality gap of this LP relaxation,which suggests that the 2/k-approximation is the best possible based on this LP relaxation.Lastly,we prove that it is Unique Games-hard to approximate the MHV problem within a factor of S2(log2 k/k).展开更多
基金supported in part by AARI,AICML,ALIDF,iCORE,and NSERC
文摘An efficient rule-based algorithm is presented for haplotype inference from general pedigree genotype data, with the assumption of no recombination. This algorithm generalizes previous algorithms to handle the cases where some pedigree founders are not genotyped, provided that for each nuclear family at least one parent is genotyped and each non-genotyped founder appears in exactly one nuclear family. The importance of this generalization lies in that such cases frequently happen in real data, because some founders may have passed away and their genotype data can no longer be collected. The algorithm runs in O(m^3n^3) time, where m is the number of single nucleotide polymorphism (SNP) loci under consideration and n is the number of genotyped members in the pedigree. This zero-recombination haplotyping algorithm is extended to a maximum parsimoniously haplotyping algorithm in one whole genome scan to minimize the total number of breakpoint sites, or equivalently, the number of maximal zero-recombination chromosomal regions. We show that such a whole genome scan haplotyping algorithm can be implemented in O(m^3n^3) time in a novel incremental fashion, here m denotes the total number of SNP loci along the chromosome.
基金the National Natural Science Foundation of China(Nos.11771114,11571252,and 61672323)the China Scholarship Council(No.201508330054)+1 种基金the Natural Science Foundation of Shandong Province(No.ZR2016AM28)the Natural Sciences and Engineering Research Council of Canada.
文摘We investigate the maximum happy vertices(MHV)problem and its complement,the minimum unhappy vertices(MUHV)problem.In order to design better approximation algorithms,we introduce the supermodular and submodular multi-labeling(SUP-ML and SUB-ML)problems and show that MHV and MUHV are special cases of SUP-ML and SUB-ML,respectively,by rewriting the objective functions as set functions.The convex relaxation on the I ovasz extension,originally presented for the submodular multi-partitioning problem,can be extended for the SUB-ML problem,thereby proving that SUB-ML(SUP-ML,respectively)can be approximated within a factorof2-2/k(2/k,respectively),where k is the number of labels.These general results imply that MHV and MUHV can also be approximated within factors of 2/k and 2-2/k,respectively,using the same approximation algorithms.For the MUHV problem,we also show that it is approximation-equivalent to the hypergraph multiway cut problem;thus,MUHV is Unique Games-hard to achieve a(2-2/k-e)-approximation,for anyε>0.For the MHV problem,the 2/k-approximation improves the previous best approximation ratio max{1/k,1/(△+1/g(△)},where△is the maximum vertex degree of the input graph and g(△)=(√△+√△+1)2△>4△2.We also show that an existing LP relaxation for MHV is the same as the concave relaxation on the Lovasz extension for SUP-ML;we then prove an upper bound of 2/k on the integrality gap of this LP relaxation,which suggests that the 2/k-approximation is the best possible based on this LP relaxation.Lastly,we prove that it is Unique Games-hard to approximate the MHV problem within a factor of S2(log2 k/k).