XAGE-1b(X antigen family member 1B)属于XAGE亚家族,是一种肿瘤-睾丸抗原(cancer/testis antigen,CTA),表达于正常人睾丸组织和多种类型的肿瘤细胞中.本实验室前期研究发现,该基因在涎腺腺样囊性癌高转移细胞系中呈高表达.为了进一步...XAGE-1b(X antigen family member 1B)属于XAGE亚家族,是一种肿瘤-睾丸抗原(cancer/testis antigen,CTA),表达于正常人睾丸组织和多种类型的肿瘤细胞中.本实验室前期研究发现,该基因在涎腺腺样囊性癌高转移细胞系中呈高表达.为了进一步研究XAGE-1b下游调控基因,本实验采用ChIP-Seq技术筛查XAGE-1b蛋白质可能存在的DNA结合片段.结果发现,XAGE-1b下游调控基因富集于细胞分裂(cell division,P-Value=7.95e-04)、细胞周期调控(cell cycle,P-Value=5.532e-03)、及癌症相关基因(GESA/MSigDB module_11,P-Value=2.010e-06)中.同时发现,XAGE-1b下游调控多个基因的表达产物(NCBI/interactions 22827,P-Value=4.678e-06)能与原癌基因c-Myc的启动子抑制蛋白PUF60发生蛋白质相互作用,并通过qPCR进行了验证.这些研究对阐明XAGE-1b在肿瘤细胞的增殖和转移中的作用有重要意义.展开更多
染色体免疫共沉淀测序(Chromatin immunoprecipitation followed by sequencing,ChIP-seq)是研究DNA-蛋白质互作的有力工具,被广泛用于RNA聚合酶、转录因子和组蛋白修饰等在基因组上的精确定位。近年来,在ChIP-seq技术的基础上,科学家...染色体免疫共沉淀测序(Chromatin immunoprecipitation followed by sequencing,ChIP-seq)是研究DNA-蛋白质互作的有力工具,被广泛用于RNA聚合酶、转录因子和组蛋白修饰等在基因组上的精确定位。近年来,在ChIP-seq技术的基础上,科学家提出了一系列研究DNA-蛋白质互作的技术方法,提高了测序分辨率,降低了实验成本,极大推动了表观基因组学的发展。本文综述了多种DNA-蛋白质互作研究技术的原理及其应用场景,介绍了在单细胞水平上研究DNA-蛋白质互作的实现方法,并展望其未来发展的方向。展开更多
花生(Arachis hypogaea L.)是重要的经济油料作物,其生长发育、产量与品质受干旱影响。为深入了解花生的抗旱机理,本研究通过ChIP-seq对组蛋白去乙酰化酶AhHDA1和转录因子AhGLK1富集的DNA序列进行分析,揭示两者调控的下游靶基因网络。...花生(Arachis hypogaea L.)是重要的经济油料作物,其生长发育、产量与品质受干旱影响。为深入了解花生的抗旱机理,本研究通过ChIP-seq对组蛋白去乙酰化酶AhHDA1和转录因子AhGLK1富集的DNA序列进行分析,揭示两者调控的下游靶基因网络。通过比对分析,GLK-IP获得6571万clean beads,HDA-IP获得6390万clean beads,Input获得7006万clean beads,唯一比对率分别为74.97%、76.81%和76.75%。GLK-IP获得714个peak, HDA-IP获得543个peak。Peak在基因的外显子、内含子、上游、下游和基因间等功能元件均有分布。GO富集结果显示,AhGLK1-IP和AhHDA1-IP的peak相关基因在分子功能中的富集分别为35.1%和32.8%,在生物学过程中的富集分别为39.3%和44.2%,在细胞组分中的富集分别为25.5%和22.8%。KEGG信号通路富集结果显示,AhGLK1-IP相关基因显著富集在“代谢途径(metabolic pathways)”、“抗生素生物合成(biosynthesis of antibiotics)”、“二羧酸代谢(glyoxylate and dicarboxylate metabolism)”、“不同环境中微生物代谢(microbial metabolism in diverse environments)”、“碳代谢(carbon metabolism)”、“次生代谢生物合成(biosynthesis of secondary metabolites)”和“氨基酸生物合成(biosynthesis of amino acids)。而AhHDA1-IP相关基因在“N聚糖生物合成(N-glycan biosynthesis)”、“精氨酸和脯氨酸代谢(arginine and proline metabolism)”和“苯丙氨酸代谢(phenylalanine metabolism)”通路显著富集。AhGLK1-IP和AhHDA1-IP共同富集的peak有4个,在AhGLK1-IP和AhHDA1-IP特异富集的基序(motif)中存在共同的保守序列AGAA/T。研究结果为深入认识AhGLK1和AhHDA1基因的功能和了解花生响应干旱胁迫和旱后恢复生长中的调控机制具有参考价值。展开更多
Chromatin immunoprecipitation followed by sequencing(ChIP-seq)is increasingly being used for genome-wide profiling of transcriptional regulation,as this technique enables dissection of the gene regulatory networks.Wit...Chromatin immunoprecipitation followed by sequencing(ChIP-seq)is increasingly being used for genome-wide profiling of transcriptional regulation,as this technique enables dissection of the gene regulatory networks.With input as control,a variety of statistical methods have been proposed for identifying the enriched regions in the genome,i.e.,the transcriptional factor binding sites and chromatin modifications.However,when there are no controls,whether peak calling is still reliable awaits systematic evaluations.To address this question,we used a Bayesian framework approach to show the effectiveness of peak calling without controls(PCWC).Using several different types of ChIP-seq data,we demonstrated the relatively high accuracy of PCWC with less than a 5%false discovery rate(FDR).Compared with previously published methods,e.g.,the model-based analysis of ChIP-seq(MACS),PCWC is reliable with lower FDR.Furthermore,to interpret the biological significance of the called peaks,in combination with microarray gene expression data,gene ontology annotation and subsequent motif discovery,our results indicate PCWC possesses a high efficiency.Additionally,using in silico data,only a small number of peaks were identified,suggesting the significantly low FDR for PCWC.展开更多
Background:Histone modifications are major factors that define chromatin states and have functions in regulating gene expression in eukaryotic cells.Chromatin immunoprecipitation coupled with high-throughput sequencin...Background:Histone modifications are major factors that define chromatin states and have functions in regulating gene expression in eukaryotic cells.Chromatin immunoprecipitation coupled with high-throughput sequencing(ChIP-seq)technique has been widely used for profiling the genome-wide distribution of chromatin-associating protein factors.Some histone modifications,such as H3K27me3 and H3K9me3,usually mark broad domains in the genome ranging from kilobases(kb)to megabases(Mb)long,resulting in diffuse patterns in the ChIP-seq data that are challenging for signal separation.While most existing ChIP-seq peak-calling algorithms are based on local statistical models without account of multi-scale features,a principled method to identify scale-free board domains has been lacking.Methods:Here we present RECOGNICER(Recursive coarse-graining identification for ChIP-seq enriched regions),a computational method for identifying ChIP-seq enriched domains on a large range of scales.The algorithm is based on a coarse-graining approach,which uses recursive block transformations to determine spatial clustering of local enriched elements across multiple length scales.Results:We apply RECOGNICER to call H3K27me3 domains from ChIP-seq data,and validate the results based on H3K27me3's association with repressive gene expression.We show that RECOGNICER outperforms existing ChIP-seq broad domain calling tools in identifying more whole domains than separated pieces.Conclusion:RECOGNICER can be a useful bioinformatics tool for next-generation sequencing data analysis in epigenomics research.展开更多
Histone methylation is a kind of important epigenetic modification which occurs on the lysine residue or arginine residue of histone tails(Zhang and Reinberg,2001).It takes part in multiple biological processes,incl...Histone methylation is a kind of important epigenetic modification which occurs on the lysine residue or arginine residue of histone tails(Zhang and Reinberg,2001).It takes part in multiple biological processes,including gene expression,genomic stability,stem cell maturity,genetic imprinting,mitosis and development(Fischle et al.,2005).展开更多
目前小分子化合物诱导山羊(Capra hircus)耳缘成纤维细胞(对照组)转分化为诱导乳腺上皮细胞(命名为CiMECs或5i8d,后文均以5i8d为准表明其处理条件)的技术平台已经被建立。本研究对转分化前后的细胞(分别命名为Control和5i8d)的两种组蛋...目前小分子化合物诱导山羊(Capra hircus)耳缘成纤维细胞(对照组)转分化为诱导乳腺上皮细胞(命名为CiMECs或5i8d,后文均以5i8d为准表明其处理条件)的技术平台已经被建立。本研究对转分化前后的细胞(分别命名为Control和5i8d)的两种组蛋白修饰,即组蛋白第三亚基4号赖氨酸的三甲基化(trimethylation of lysine 4 on histone H3 protein subunit,H3K4me3)和组蛋白第三亚基27号赖氨酸的三甲基化(trimethylation of lysine 27 on histone H3 protein subunit,H3K27me3)分别进行染色质免疫共沉淀实验(chromatin immunoprecipitation,ChIP),并进行测序分析,以探讨组蛋白修饰H3K4me3和H3K27me3在转分化前后细胞中表达模式的变化。GO富集显示H3K4me3中差异Peaks关联的基因富集在与乳腺发育分化相关的通路,包括上皮形态发生、丝裂原活化蛋白激酶(mitogen-activated protein kinase,MAPK)级联调节、Wnt(Wingless-type MMTV integration site family)受体通路以及β-连环蛋白(β-catenin)受体通路,提示乳腺上皮细胞命运已经被激活;H3K27me3中差异Peaks关联的基因则富集在固有膜、固有质膜以及肌动蛋白丝等GO条目上,提示维持成纤维细胞膜及其骨架特性的通路被抑制,成纤维细胞的谱系被打破。KEGG富集显示H3K4me3中差异Peaks关联的基因富集在MAPK信号通路、胰岛素及分泌信号通路、磷脂酰肌醇3-激酶-蛋白激酶B(PI3K-Akt)信号通路以及雌激素信号通路等,这些信号通路与乳腺发育分化密切相关,提示乳腺上皮细胞的谱系被触发;H3K27me3中差异Peaks关联的基因主要富集在黏着斑信号通路、细胞外基质(extracellular matrix,ECM)受体互作信号通路和细胞黏附因子等KEGG信号通路上,提示成纤维细胞间连接特性的有关通路被抑制。上述结果说明小分子化合物成功诱导山羊耳缘成纤维细胞转分化为具有泌乳功能的乳腺上皮细胞。该研究为乳腺上皮细胞命运决定、发育分化和泌乳的研究提供了一定的理论支持。展开更多
The combination of chromatin immunoprecipitation with sequencing (ChIP-Seq) is an effective method for obtaining an in vivo genome-wide profile of the interaction of a protein with DNA. With the dramatic development o...The combination of chromatin immunoprecipitation with sequencing (ChIP-Seq) is an effective method for obtaining an in vivo genome-wide profile of the interaction of a protein with DNA. With the dramatic development of high-throughput short sequencing technologies, several new algorithms have been developed to process ChIP-Seq. However, the reported analytical tools for ChIP-Seq based on size selection of immunoprecipitated (IPed) DNA fragments are mainly adopted on the Solexa system. As a sequencer with the highest throughput, few studies of ChIP-Seq based on SOLiD system have been reported. The main difference of the SOLiD and Solexa systems exists in the length of DNA fragments during preparing sequencing libraries. The SOLiD system has relatively short DNA fragments if it processes a further sonication of IPed DNA fragments in order to meet the length requirement of DNA fragments for emulsion-PCR (ePCR). This work aims to investigate the influences of DNA fragment length on data analysis from ChIP-Seq. Previous studies show that typical bimodal peaks can be observed in Solexa ChIP-Seq data, but based on the analysis of the real SOLiD ChIP-Seq data in this study, we found that there were no double peaks with apparent reads shift in a local enriched region and the local reads distribution of peaks were tested by normal distribution. Using real and simulated ChIP-Seq data, three main ChIP-Seq algorithms (CisGenome, SISSRs and MACS) have been investigated. We found that algorithms developed for processing ChIP-Seq data generated from Solexa library protocol, cannot efficiently capture the feature of the ChIP-Seq data from SOLiD library. Misuse of those analytical tools would be a possible reason for failure of ChIP-Seq on the SOLiD system. Therefore, a new ChIP-Seq analytical strategy for an extra-sonication of IPed DNA fragments needs to be developed.展开更多
文摘XAGE-1b(X antigen family member 1B)属于XAGE亚家族,是一种肿瘤-睾丸抗原(cancer/testis antigen,CTA),表达于正常人睾丸组织和多种类型的肿瘤细胞中.本实验室前期研究发现,该基因在涎腺腺样囊性癌高转移细胞系中呈高表达.为了进一步研究XAGE-1b下游调控基因,本实验采用ChIP-Seq技术筛查XAGE-1b蛋白质可能存在的DNA结合片段.结果发现,XAGE-1b下游调控基因富集于细胞分裂(cell division,P-Value=7.95e-04)、细胞周期调控(cell cycle,P-Value=5.532e-03)、及癌症相关基因(GESA/MSigDB module_11,P-Value=2.010e-06)中.同时发现,XAGE-1b下游调控多个基因的表达产物(NCBI/interactions 22827,P-Value=4.678e-06)能与原癌基因c-Myc的启动子抑制蛋白PUF60发生蛋白质相互作用,并通过qPCR进行了验证.这些研究对阐明XAGE-1b在肿瘤细胞的增殖和转移中的作用有重要意义.
文摘染色体免疫共沉淀测序(Chromatin immunoprecipitation followed by sequencing,ChIP-seq)是研究DNA-蛋白质互作的有力工具,被广泛用于RNA聚合酶、转录因子和组蛋白修饰等在基因组上的精确定位。近年来,在ChIP-seq技术的基础上,科学家提出了一系列研究DNA-蛋白质互作的技术方法,提高了测序分辨率,降低了实验成本,极大推动了表观基因组学的发展。本文综述了多种DNA-蛋白质互作研究技术的原理及其应用场景,介绍了在单细胞水平上研究DNA-蛋白质互作的实现方法,并展望其未来发展的方向。
文摘花生(Arachis hypogaea L.)是重要的经济油料作物,其生长发育、产量与品质受干旱影响。为深入了解花生的抗旱机理,本研究通过ChIP-seq对组蛋白去乙酰化酶AhHDA1和转录因子AhGLK1富集的DNA序列进行分析,揭示两者调控的下游靶基因网络。通过比对分析,GLK-IP获得6571万clean beads,HDA-IP获得6390万clean beads,Input获得7006万clean beads,唯一比对率分别为74.97%、76.81%和76.75%。GLK-IP获得714个peak, HDA-IP获得543个peak。Peak在基因的外显子、内含子、上游、下游和基因间等功能元件均有分布。GO富集结果显示,AhGLK1-IP和AhHDA1-IP的peak相关基因在分子功能中的富集分别为35.1%和32.8%,在生物学过程中的富集分别为39.3%和44.2%,在细胞组分中的富集分别为25.5%和22.8%。KEGG信号通路富集结果显示,AhGLK1-IP相关基因显著富集在“代谢途径(metabolic pathways)”、“抗生素生物合成(biosynthesis of antibiotics)”、“二羧酸代谢(glyoxylate and dicarboxylate metabolism)”、“不同环境中微生物代谢(microbial metabolism in diverse environments)”、“碳代谢(carbon metabolism)”、“次生代谢生物合成(biosynthesis of secondary metabolites)”和“氨基酸生物合成(biosynthesis of amino acids)。而AhHDA1-IP相关基因在“N聚糖生物合成(N-glycan biosynthesis)”、“精氨酸和脯氨酸代谢(arginine and proline metabolism)”和“苯丙氨酸代谢(phenylalanine metabolism)”通路显著富集。AhGLK1-IP和AhHDA1-IP共同富集的peak有4个,在AhGLK1-IP和AhHDA1-IP特异富集的基序(motif)中存在共同的保守序列AGAA/T。研究结果为深入认识AhGLK1和AhHDA1基因的功能和了解花生响应干旱胁迫和旱后恢复生长中的调控机制具有参考价值。
基金the National 973 project of China(2011CBA01101)the National Natural Science Foundation of China(30871343 and 31130051)。
文摘Chromatin immunoprecipitation followed by sequencing(ChIP-seq)is increasingly being used for genome-wide profiling of transcriptional regulation,as this technique enables dissection of the gene regulatory networks.With input as control,a variety of statistical methods have been proposed for identifying the enriched regions in the genome,i.e.,the transcriptional factor binding sites and chromatin modifications.However,when there are no controls,whether peak calling is still reliable awaits systematic evaluations.To address this question,we used a Bayesian framework approach to show the effectiveness of peak calling without controls(PCWC).Using several different types of ChIP-seq data,we demonstrated the relatively high accuracy of PCWC with less than a 5%false discovery rate(FDR).Compared with previously published methods,e.g.,the model-based analysis of ChIP-seq(MACS),PCWC is reliable with lower FDR.Furthermore,to interpret the biological significance of the called peaks,in combination with microarray gene expression data,gene ontology annotation and subsequent motif discovery,our results indicate PCWC possesses a high efficiency.Additionally,using in silico data,only a small number of peaks were identified,suggesting the significantly low FDR for PCWC.
基金the U.S.National Institutes of Health(NIH)R35GM133712 to C.Z.R01 AI121080 and R01AI139874 to W.P.
文摘Background:Histone modifications are major factors that define chromatin states and have functions in regulating gene expression in eukaryotic cells.Chromatin immunoprecipitation coupled with high-throughput sequencing(ChIP-seq)technique has been widely used for profiling the genome-wide distribution of chromatin-associating protein factors.Some histone modifications,such as H3K27me3 and H3K9me3,usually mark broad domains in the genome ranging from kilobases(kb)to megabases(Mb)long,resulting in diffuse patterns in the ChIP-seq data that are challenging for signal separation.While most existing ChIP-seq peak-calling algorithms are based on local statistical models without account of multi-scale features,a principled method to identify scale-free board domains has been lacking.Methods:Here we present RECOGNICER(Recursive coarse-graining identification for ChIP-seq enriched regions),a computational method for identifying ChIP-seq enriched domains on a large range of scales.The algorithm is based on a coarse-graining approach,which uses recursive block transformations to determine spatial clustering of local enriched elements across multiple length scales.Results:We apply RECOGNICER to call H3K27me3 domains from ChIP-seq data,and validate the results based on H3K27me3's association with repressive gene expression.We show that RECOGNICER outperforms existing ChIP-seq broad domain calling tools in identifying more whole domains than separated pieces.Conclusion:RECOGNICER can be a useful bioinformatics tool for next-generation sequencing data analysis in epigenomics research.
基金supported by the National Natural Science Foundation of China(Nos.31540033 and91131002)the Precision Medicine Research Program of the Chinese Academy of Sciences(KJZD-EW-L14)+2 种基金the Strategic Priority Research Program of the Chinese Academy of Sciences(XDA12020343)the National Basic Research Program of China(2013CB911001 and 2012CB518302)the National Excellent Youth Science Foundation of China(No.31222030)
文摘Histone methylation is a kind of important epigenetic modification which occurs on the lysine residue or arginine residue of histone tails(Zhang and Reinberg,2001).It takes part in multiple biological processes,including gene expression,genomic stability,stem cell maturity,genetic imprinting,mitosis and development(Fischle et al.,2005).
文摘目前小分子化合物诱导山羊(Capra hircus)耳缘成纤维细胞(对照组)转分化为诱导乳腺上皮细胞(命名为CiMECs或5i8d,后文均以5i8d为准表明其处理条件)的技术平台已经被建立。本研究对转分化前后的细胞(分别命名为Control和5i8d)的两种组蛋白修饰,即组蛋白第三亚基4号赖氨酸的三甲基化(trimethylation of lysine 4 on histone H3 protein subunit,H3K4me3)和组蛋白第三亚基27号赖氨酸的三甲基化(trimethylation of lysine 27 on histone H3 protein subunit,H3K27me3)分别进行染色质免疫共沉淀实验(chromatin immunoprecipitation,ChIP),并进行测序分析,以探讨组蛋白修饰H3K4me3和H3K27me3在转分化前后细胞中表达模式的变化。GO富集显示H3K4me3中差异Peaks关联的基因富集在与乳腺发育分化相关的通路,包括上皮形态发生、丝裂原活化蛋白激酶(mitogen-activated protein kinase,MAPK)级联调节、Wnt(Wingless-type MMTV integration site family)受体通路以及β-连环蛋白(β-catenin)受体通路,提示乳腺上皮细胞命运已经被激活;H3K27me3中差异Peaks关联的基因则富集在固有膜、固有质膜以及肌动蛋白丝等GO条目上,提示维持成纤维细胞膜及其骨架特性的通路被抑制,成纤维细胞的谱系被打破。KEGG富集显示H3K4me3中差异Peaks关联的基因富集在MAPK信号通路、胰岛素及分泌信号通路、磷脂酰肌醇3-激酶-蛋白激酶B(PI3K-Akt)信号通路以及雌激素信号通路等,这些信号通路与乳腺发育分化密切相关,提示乳腺上皮细胞的谱系被触发;H3K27me3中差异Peaks关联的基因主要富集在黏着斑信号通路、细胞外基质(extracellular matrix,ECM)受体互作信号通路和细胞黏附因子等KEGG信号通路上,提示成纤维细胞间连接特性的有关通路被抑制。上述结果说明小分子化合物成功诱导山羊耳缘成纤维细胞转分化为具有泌乳功能的乳腺上皮细胞。该研究为乳腺上皮细胞命运决定、发育分化和泌乳的研究提供了一定的理论支持。
基金supported by the National Natural Science Foundation of China (30871393)National Hing-Tech Research & Development Project of China (2006AA020702)
文摘The combination of chromatin immunoprecipitation with sequencing (ChIP-Seq) is an effective method for obtaining an in vivo genome-wide profile of the interaction of a protein with DNA. With the dramatic development of high-throughput short sequencing technologies, several new algorithms have been developed to process ChIP-Seq. However, the reported analytical tools for ChIP-Seq based on size selection of immunoprecipitated (IPed) DNA fragments are mainly adopted on the Solexa system. As a sequencer with the highest throughput, few studies of ChIP-Seq based on SOLiD system have been reported. The main difference of the SOLiD and Solexa systems exists in the length of DNA fragments during preparing sequencing libraries. The SOLiD system has relatively short DNA fragments if it processes a further sonication of IPed DNA fragments in order to meet the length requirement of DNA fragments for emulsion-PCR (ePCR). This work aims to investigate the influences of DNA fragment length on data analysis from ChIP-Seq. Previous studies show that typical bimodal peaks can be observed in Solexa ChIP-Seq data, but based on the analysis of the real SOLiD ChIP-Seq data in this study, we found that there were no double peaks with apparent reads shift in a local enriched region and the local reads distribution of peaks were tested by normal distribution. Using real and simulated ChIP-Seq data, three main ChIP-Seq algorithms (CisGenome, SISSRs and MACS) have been investigated. We found that algorithms developed for processing ChIP-Seq data generated from Solexa library protocol, cannot efficiently capture the feature of the ChIP-Seq data from SOLiD library. Misuse of those analytical tools would be a possible reason for failure of ChIP-Seq on the SOLiD system. Therefore, a new ChIP-Seq analytical strategy for an extra-sonication of IPed DNA fragments needs to be developed.