The sequence upstream of the antibody variable region(antibody upstream sequence[AUS])consists of a 5′untranslated region(5′UTR)and a preceding leader region.The sequence variations in AUS affect antibody engineerin...The sequence upstream of the antibody variable region(antibody upstream sequence[AUS])consists of a 5′untranslated region(5′UTR)and a preceding leader region.The sequence variations in AUS affect antibody engineering and PCR based antibody quantification and may also be implicated in mRNA transcription and translation.However,the diversity of AUSs remains elusive.Using 5′rapid amplification of cDNA ends and high-throughput antibody repertoire sequencing technique,we acquired full-length AUSs for human,rhesus macaque,cynomolgus macaque,mouse,and rat.We designed a bioinformatics pipeline and identified 3307 unique AUSs,corresponding to 3026 and 1457 unique sequences for 5′UTR and leader region,respectively.Comparative analysis indicated that 928(63.69%)leader sequences are novel relative to those recorded in the international ImMunoGeneTics information system.Evolutionarily,leader sequences are more conserved than 5′UTR and seem to coevolve with their downstream V genes.Besides,single-nucleotide polymorphisms are position dependent for leader regions and may contribute to the functional reversal of the downstream V genes.Finally,the AUGs in AUSs were found to have little impact on gene expression.Taken together,our findings can facilitate primer design for capturing antibodies efficiently and provide a valuable resource for antibody engineering and molecule-level antibody studies.展开更多
Identification of genetic variants via high-throughput sequencing(HTS)technologies has been essential for both fundamental and clinical studies.However,to what extent the genome sequence composition affects variant ca...Identification of genetic variants via high-throughput sequencing(HTS)technologies has been essential for both fundamental and clinical studies.However,to what extent the genome sequence composition affects variant calling remains unclear.In this study,we identified 63,897 multi-copy sequences(MCSs)with a minimum length of 300 bp,each of which occurs at least twice in the human genome.The 151,749 genomic loci(multi-copy regions,or MCRs)harboring these MCSs account for 1.98% of the genome and are distributed unevenly across chromosomes.MCRs containing the same MCS tend to be located on the same chromosome.Gene Ontology(GO)analyses revealed that 3800 genes whose UTRs or exons overlap with MCRs are enriched for Golgirelated cellular component terms and various enzymatic activities in the GO biological function category.MCRs are also enriched for loci that are sensitive to neocarzinostatin-induced double-strand breaks.Moreover,genetic variants discovered by genome-wide association studies and recorded in dbSNP are significantly underrepresented in MCRs.Using simulated HTS datasets,we show that false variant discovery rates are significantly higher in MCRs than in other genomic regions.These results suggest that extra caution must be taken when identifying genetic variants in the MCRs via HTS technologies.展开更多
基金supported by the National Natural Science Foundation of China(NSFC)(31771479 to Z.Z.)NSFC Projects of International Cooperation and Exchanges of NSFC(61661146004 to Z.Z.)+1 种基金the Local Innovative and Research Teams Project of Guangdong Pearl River Talents Program(2017BT01S131 to Z.Z.)Guangdong-Hong Kong-Macao-Joint Labs Program from Guangdong Science and Technology(2019B121205005 to X.Y.)。
文摘The sequence upstream of the antibody variable region(antibody upstream sequence[AUS])consists of a 5′untranslated region(5′UTR)and a preceding leader region.The sequence variations in AUS affect antibody engineering and PCR based antibody quantification and may also be implicated in mRNA transcription and translation.However,the diversity of AUSs remains elusive.Using 5′rapid amplification of cDNA ends and high-throughput antibody repertoire sequencing technique,we acquired full-length AUSs for human,rhesus macaque,cynomolgus macaque,mouse,and rat.We designed a bioinformatics pipeline and identified 3307 unique AUSs,corresponding to 3026 and 1457 unique sequences for 5′UTR and leader region,respectively.Comparative analysis indicated that 928(63.69%)leader sequences are novel relative to those recorded in the international ImMunoGeneTics information system.Evolutionarily,leader sequences are more conserved than 5′UTR and seem to coevolve with their downstream V genes.Besides,single-nucleotide polymorphisms are position dependent for leader regions and may contribute to the functional reversal of the downstream V genes.Finally,the AUGs in AUSs were found to have little impact on gene expression.Taken together,our findings can facilitate primer design for capturing antibodies efficiently and provide a valuable resource for antibody engineering and molecule-level antibody studies.
基金supported by the National Natural Science Foundation of China(NSFC,Grant No.31771479)Science Fund for Creative Research Groups of the NSFC(Grant No.81521003)+5 种基金Projects of International Cooperation and Exchanges of NSFC(Grant No.61661146004)Municipal Planning Projects of Scientific Technology of Guangdong(Grant No.201804020083)the Science and Technology Program of Guangzhou(Grant No.201400000004)the Natural Science Foundation of Guangdong(Grant No.2015B050501006)the Team Program of Natural Science Foundation of Guangdong(Grant No.2014A030312002)the 1000 Talents Program of China。
文摘Identification of genetic variants via high-throughput sequencing(HTS)technologies has been essential for both fundamental and clinical studies.However,to what extent the genome sequence composition affects variant calling remains unclear.In this study,we identified 63,897 multi-copy sequences(MCSs)with a minimum length of 300 bp,each of which occurs at least twice in the human genome.The 151,749 genomic loci(multi-copy regions,or MCRs)harboring these MCSs account for 1.98% of the genome and are distributed unevenly across chromosomes.MCRs containing the same MCS tend to be located on the same chromosome.Gene Ontology(GO)analyses revealed that 3800 genes whose UTRs or exons overlap with MCRs are enriched for Golgirelated cellular component terms and various enzymatic activities in the GO biological function category.MCRs are also enriched for loci that are sensitive to neocarzinostatin-induced double-strand breaks.Moreover,genetic variants discovered by genome-wide association studies and recorded in dbSNP are significantly underrepresented in MCRs.Using simulated HTS datasets,we show that false variant discovery rates are significantly higher in MCRs than in other genomic regions.These results suggest that extra caution must be taken when identifying genetic variants in the MCRs via HTS technologies.