Single-cell Hi-C technology provides an unprecedented opportunity to reveal chromatin structure in individual cells.However,high sequencing cost impedes the generation of biological Hi-C data with high sequencing dept...Single-cell Hi-C technology provides an unprecedented opportunity to reveal chromatin structure in individual cells.However,high sequencing cost impedes the generation of biological Hi-C data with high sequencing depths and multiple replicates for downstream analysis.Here,we developed a single-cell Hi-C simulator(scHi-CSim)that generates high-fidelity data for benchmarking.scHi-CSim merges neighboring cells to overcome the sparseness of data,samples interactions in distance-stratified chromosomes to maintain the heterogeneity of single cells,and estimates the empirical distribution of restriction fragments to generate simulated data.We demonstrated that scHi-CSim can generate high-fidelity data by comparing the performance of single-cell clustering and detection of chromosomal high-order structures with raw data.Furthermore,scHi-CSim is flexible to change sequencing depth and the number of simulated replicates.We showed that increasing sequencing depth could improve the accuracy of detecting topologically associating domains.We also used scHi-CSim to generate a series of simulated datasets with different sequencing depths to benchmark scHi-C clustering methods.展开更多
The honeybee (Apis mellifera) is a social insect with strong sensory capacity and diverse behavioral repertoire and is recognized as a good model organism for studying the neurobiological basis of learning and memor...The honeybee (Apis mellifera) is a social insect with strong sensory capacity and diverse behavioral repertoire and is recognized as a good model organism for studying the neurobiological basis of learning and memory. In this study, we analyzed the changes in microRNA (miRNA) and messenger RNA (mRNA) following maze-based visual learning using next-generation small RNA sequencing and Solexa/lllumina Digital Gene Expression tag profiling (DGE). For small RNA sequencing, we obtained 13 367 770 and 13 132 655 clean tags from the maze and control groups, respectively. A total of 40 differentially expressed known miRNAs were detected between these two samples, and all of them were up-regulated in the maze group compared to the control group. For DGE, 5 681 320 and 5 939 855 clean tags were detected from the maze and control groups, respectively. There were a total of 388 differentially expressed genes between these two samples, with 45 genes up-regulated and 343 genes down-regulated in the maze group, compared to the control group. Additionally, the expression levels of 10 differentially expressed genes were confirmed by quantitative reverse transcription polymerase chain reaction (qRT-PCR) and the expression trends of eight of them were consistent with the DGE result, although the degree of change was lower in amplitude. The integrative analysis of miRNA and mRNA expression showed that, among the 40 differentially expressed known miRNAs and 388 differentially expressed genes, 60 pairs of miRNA/mRNA were identified as co-expressed in our present study. These results suggest that both miRNA and mRNA may play a pivotal role in the process of learning and memory in honeybees. Our sequencing data provide comprehensive miRNA and gene expression information for maze-based visual learning, which will facilitate understanding of the molecular mechanisms of honeybee learning and memory.展开更多
The sequencing revolution driven by high-throughput technologies has generated a huge amount of marine microbial sequences which hide the interaction patterns among microbial species and environment factors. Exploring...The sequencing revolution driven by high-throughput technologies has generated a huge amount of marine microbial sequences which hide the interaction patterns among microbial species and environment factors. Exploring these patterns is helpful for exploiting the marine resources. In this paper, we use the complex network approach to mine and analyze the interaction patterns of marine taxa and environments in spring, summer, fall and winter seasons. With the 16S rRNA pyrosequencing data of 76 time point taken monthly over 6 years, we first use our MtHc clustering algorithm to generate the operational taxonomic units (OTUs). Then, employ the k-means method to divide 76 time point samples into four seasonal groups, and utilize mutual information (MI) to construct the four correlation networks among microbial species and environment factors. Finally, we adopt the symmetrical non-negative matrix factorization method to detect the interaction patterns, and analysis the relationship between marine species and environment factors. The results show that the four seasonal microbial interaction networks have the characters of complex networks, and interaction patterns are related with the seasonal variability; the same environmental factor influences different species in the four seasons; the four environmental factors of day length, photosynthetically active radiation, NO2+ NO3 and silicate may have stronger influences on microbes than other environment factors.展开更多
A convenient and efficient approach for difluoroalkyl-containing γ-hutyrolactones via the radical addition reaction of iododifluoromethyl ketones with 4-pentenoic acids initiated by AIBN in CH3CN at 60 ℃ was reporte...A convenient and efficient approach for difluoroalkyl-containing γ-hutyrolactones via the radical addition reaction of iododifluoromethyl ketones with 4-pentenoic acids initiated by AIBN in CH3CN at 60 ℃ was reported. Various difluoroalkyllcontaining γ-valerolactones were also synthesized under this reaction conditions.展开更多
Identification of cancer driver genes plays an important role in precision oncology research,which is helpful to understand cancer initiation and progression.However,most existing computational methods mainly used the...Identification of cancer driver genes plays an important role in precision oncology research,which is helpful to understand cancer initiation and progression.However,most existing computational methods mainly used the protein–protein interaction(PPI)networks,or treated the directed gene regulatory networks(GRNs)as the undirected gene–gene association networks to identify the cancer driver genes,which will lose the unique structure regulatory information in the directed GRNs,and then affect the outcome of the cancer driver gene identification.Here,based on the multi-omics pan-cancer data(i.e.,gene expression,mutation,copy number variation,and DNA methylation),we propose a novel method(called DGMP)to identify cancer driver genes by jointing directed graph convolutional network(DGCN)and multilayer perceptron(MLP).DGMP learns the multi-omics features of genes as well as the topological structure features in GRN with the DGCN model and uses MLP to weigh more on gene features for mitigating the bias toward the graph topological features in the DGCN learning process.The results on three GRNs show that DGMP outperforms other existing state-of-the-art methods.The ablation experimental results on the Dawn Net network indicate that introducing MLP into DGCN can offset the performance degradation of DGCN,and jointing MLP and DGCN can effectively improve the performance of identifying cancer driver genes.DGMP can identify not only the highly mutated cancer driver genes but also the driver genes harboring other kinds of alterations(e.g.,differential expression and aberrant DNA methylation)or genes involved in GRNs with other cancer genes.The source code of DGMP can be freely downloaded from https://github.com/NWPU-903PR/DGMP.展开更多
Background:One of the challenges in personalized medicine is to determine specific drugs and their dosages for patient individuals who are undergoing a common disease.The technique of cell lines provides a safe approa...Background:One of the challenges in personalized medicine is to determine specific drugs and their dosages for patient individuals who are undergoing a common disease.The technique of cell lines provides a safe approach to capture the drug responses of patient individuals when given specific drugs with varied dosages.However,it is still costly to determine drug responses in cells w.r.t dosages by biological assays.Computational methods provide a promising screening to infer possible drug responses in the cells of patient individuals on a large scale.Nevertheless,existing computational approaches are insufficient to interpret the underlying reason for drug responses.Methods:In this work,we propose an interpretable model for analyzing and predicting drug responses across cell lines.The proposed model bridges drug features(e.g.f chemical structure fingerprints),cell features(e.g.f gene expression profiles),and drug responses across cells(measured by IC50)by a triple matrix factorization(TMF),such that the underlying reason for drug responses in specific cells is possibly interpreted.Results'.The comparison with state-of-the-art computational approaches demonstrates the superiority of our TMF.More importantly,a case study of drug responses in lung-related cell lines shows its interpretable ability to find out highly occurring drug substructures,crucial mutated genes,as well as significant pairs between substructures and mutated genes in terms of drug sensitivity and resistance.Conclusion:TMF is an effective and interpretable approach for predicting cell lines responses to drugs,and can dig out crucial pairs of chemical substructures and genes,which uncovers the underlying reason for drug responses in specific cells.展开更多
基金supported by the National Natural Science Foundation of China(61873198 and 62132015 to L.G.,62002275 to Y.Y.,and 61621003 to S.Z.)the National Key ResearchandDevelopment ProgramoCf hina(2019YFA0709501)+1 种基金the Strategic Priority Research Program of the Chinese Academy of Sciences(XDA16021400 and XDPB17 to S.z.)the Key-Area Research and Development of Guangdong Province(2020B1111190001).
文摘Single-cell Hi-C technology provides an unprecedented opportunity to reveal chromatin structure in individual cells.However,high sequencing cost impedes the generation of biological Hi-C data with high sequencing depths and multiple replicates for downstream analysis.Here,we developed a single-cell Hi-C simulator(scHi-CSim)that generates high-fidelity data for benchmarking.scHi-CSim merges neighboring cells to overcome the sparseness of data,samples interactions in distance-stratified chromosomes to maintain the heterogeneity of single cells,and estimates the empirical distribution of restriction fragments to generate simulated data.We demonstrated that scHi-CSim can generate high-fidelity data by comparing the performance of single-cell clustering and detection of chromosomal high-order structures with raw data.Furthermore,scHi-CSim is flexible to change sequencing depth and the number of simulated replicates.We showed that increasing sequencing depth could improve the accuracy of detecting topologically associating domains.We also used scHi-CSim to generate a series of simulated datasets with different sequencing depths to benchmark scHi-C clustering methods.
基金Acknowledgments This work was supported by the Earmarked Fund for the China Agricultural Research System (No. CARS- 45-KXJ12) and the National Natural Science Foundation of China (No. 31260524). The deep sequencing and bio-information analysis work were carried out in the Beijing Genome Institute (http://www.genomics.cn/ index.php). We thank Hong Zhu for invaluable guidance and assistance in the maze experiments and dissection of samples, Dr. Aung Si and Dr. Andrew B. Barron for helpful suggestions that improved the manuscript, Fei Zhang and Zhen-Xiu Zeng for help with beekeeping, Xu Han and Shu-Yun Li for their help in maze experiments.
文摘The honeybee (Apis mellifera) is a social insect with strong sensory capacity and diverse behavioral repertoire and is recognized as a good model organism for studying the neurobiological basis of learning and memory. In this study, we analyzed the changes in microRNA (miRNA) and messenger RNA (mRNA) following maze-based visual learning using next-generation small RNA sequencing and Solexa/lllumina Digital Gene Expression tag profiling (DGE). For small RNA sequencing, we obtained 13 367 770 and 13 132 655 clean tags from the maze and control groups, respectively. A total of 40 differentially expressed known miRNAs were detected between these two samples, and all of them were up-regulated in the maze group compared to the control group. For DGE, 5 681 320 and 5 939 855 clean tags were detected from the maze and control groups, respectively. There were a total of 388 differentially expressed genes between these two samples, with 45 genes up-regulated and 343 genes down-regulated in the maze group, compared to the control group. Additionally, the expression levels of 10 differentially expressed genes were confirmed by quantitative reverse transcription polymerase chain reaction (qRT-PCR) and the expression trends of eight of them were consistent with the DGE result, although the degree of change was lower in amplitude. The integrative analysis of miRNA and mRNA expression showed that, among the 40 differentially expressed known miRNAs and 388 differentially expressed genes, 60 pairs of miRNA/mRNA were identified as co-expressed in our present study. These results suggest that both miRNA and mRNA may play a pivotal role in the process of learning and memory in honeybees. Our sequencing data provide comprehensive miRNA and gene expression information for maze-based visual learning, which will facilitate understanding of the molecular mechanisms of honeybee learning and memory.
基金ACKNOWLEDGEMENTS This paper was supported by the National Natural Science Foundation of China (Nos. 91430111, 61473232 and 61170134).
文摘The sequencing revolution driven by high-throughput technologies has generated a huge amount of marine microbial sequences which hide the interaction patterns among microbial species and environment factors. Exploring these patterns is helpful for exploiting the marine resources. In this paper, we use the complex network approach to mine and analyze the interaction patterns of marine taxa and environments in spring, summer, fall and winter seasons. With the 16S rRNA pyrosequencing data of 76 time point taken monthly over 6 years, we first use our MtHc clustering algorithm to generate the operational taxonomic units (OTUs). Then, employ the k-means method to divide 76 time point samples into four seasonal groups, and utilize mutual information (MI) to construct the four correlation networks among microbial species and environment factors. Finally, we adopt the symmetrical non-negative matrix factorization method to detect the interaction patterns, and analysis the relationship between marine species and environment factors. The results show that the four seasonal microbial interaction networks have the characters of complex networks, and interaction patterns are related with the seasonal variability; the same environmental factor influences different species in the four seasons; the four environmental factors of day length, photosynthetically active radiation, NO2+ NO3 and silicate may have stronger influences on microbes than other environment factors.
基金financial supports from the National Natural Science Foundation of China(Nos.21472126,21172148,21302128)
文摘A convenient and efficient approach for difluoroalkyl-containing γ-hutyrolactones via the radical addition reaction of iododifluoromethyl ketones with 4-pentenoic acids initiated by AIBN in CH3CN at 60 ℃ was reported. Various difluoroalkyllcontaining γ-valerolactones were also synthesized under this reaction conditions.
基金supported in part by the National Natural Science Foundation of China(Grant Nos.62173271 and 61873202 to SWZ)。
文摘Identification of cancer driver genes plays an important role in precision oncology research,which is helpful to understand cancer initiation and progression.However,most existing computational methods mainly used the protein–protein interaction(PPI)networks,or treated the directed gene regulatory networks(GRNs)as the undirected gene–gene association networks to identify the cancer driver genes,which will lose the unique structure regulatory information in the directed GRNs,and then affect the outcome of the cancer driver gene identification.Here,based on the multi-omics pan-cancer data(i.e.,gene expression,mutation,copy number variation,and DNA methylation),we propose a novel method(called DGMP)to identify cancer driver genes by jointing directed graph convolutional network(DGCN)and multilayer perceptron(MLP).DGMP learns the multi-omics features of genes as well as the topological structure features in GRN with the DGCN model and uses MLP to weigh more on gene features for mitigating the bias toward the graph topological features in the DGCN learning process.The results on three GRNs show that DGMP outperforms other existing state-of-the-art methods.The ablation experimental results on the Dawn Net network indicate that introducing MLP into DGCN can offset the performance degradation of DGCN,and jointing MLP and DGCN can effectively improve the performance of identifying cancer driver genes.DGMP can identify not only the highly mutated cancer driver genes but also the driver genes harboring other kinds of alterations(e.g.,differential expression and aberrant DNA methylation)or genes involved in GRNs with other cancer genes.The source code of DGMP can be freely downloaded from https://github.com/NWPU-903PR/DGMP.
基金supported by the National Natural Science Foundation of China(Nos.6187229761873202)as well as by Shaanxi Provincial Key R&D Program,China(No.2020KW-063).
文摘Background:One of the challenges in personalized medicine is to determine specific drugs and their dosages for patient individuals who are undergoing a common disease.The technique of cell lines provides a safe approach to capture the drug responses of patient individuals when given specific drugs with varied dosages.However,it is still costly to determine drug responses in cells w.r.t dosages by biological assays.Computational methods provide a promising screening to infer possible drug responses in the cells of patient individuals on a large scale.Nevertheless,existing computational approaches are insufficient to interpret the underlying reason for drug responses.Methods:In this work,we propose an interpretable model for analyzing and predicting drug responses across cell lines.The proposed model bridges drug features(e.g.f chemical structure fingerprints),cell features(e.g.f gene expression profiles),and drug responses across cells(measured by IC50)by a triple matrix factorization(TMF),such that the underlying reason for drug responses in specific cells is possibly interpreted.Results'.The comparison with state-of-the-art computational approaches demonstrates the superiority of our TMF.More importantly,a case study of drug responses in lung-related cell lines shows its interpretable ability to find out highly occurring drug substructures,crucial mutated genes,as well as significant pairs between substructures and mutated genes in terms of drug sensitivity and resistance.Conclusion:TMF is an effective and interpretable approach for predicting cell lines responses to drugs,and can dig out crucial pairs of chemical substructures and genes,which uncovers the underlying reason for drug responses in specific cells.