Systems biology has become an effective approach for understanding the molecular mechanisms underlying the development of lung cancer.In this study,sequences of 100 non-small cell lung cancer (NSCLC)-related proteins ...Systems biology has become an effective approach for understanding the molecular mechanisms underlying the development of lung cancer.In this study,sequences of 100 non-small cell lung cancer (NSCLC)-related proteins were downloaded from the National Center for Biotechnology Information (NCBI) databases.The Theory of Coevolution was then used to build a protein-protein interaction (PPI) network of NSCLC.Adopting the reverse thinking approach,we analyzed the NSCLC proteins one at a time.Fifteen key proteins were identified and categorized into a special protein family F(K),which included Cyclin D1 (CCND1),E-cadherin (CDH1),Cyclin-dependent kinase inhibitor 2A (CDKN2A),chemokine (C-X-C motif) ligand 12 (CXCL12),epidermal growth factor (EGF),epidermal growth factor receptor (EGFR),TNF receptor superfamily,member 6(FAS),FK506 binding protein 12-rapamycin associated protein 1 (FRAP1),O-6-methylguanine-DNA methyltransferase (MGMT),parkinson protein 2,E3 ubiquitin protein ligase (PARK2),phosphatase and tensin homolog (PTEN),calcium channel voltage-dependent alpha 2/delta subunit 2 (CACNA2D2),tubulin beta class I (TUBB),SWI/SNF-related,matrix-associated,actin-dependent regulator of chromatin,subfamily a,member 2 (SMARCA2),and wingless-type MMTV integration site family,member 7A (WNT7A).Seven key nodes of the sub-network were identified,which included PARK2,WNT7A,SMARCA2,FRAP1,CDKN2A,CCND1,and EGFR.The PPI predictions of EGFR-EGF,PARK2-FAS,PTEN-FAS,and CACNA2D2-CDH1 were confirmed experimentally by retrieving the Biological General Repository for Interaction Datasets (BioGRID) and PubMed databases.We proposed that the 7 proteins could serve as potential diagnostic molecular markers for NSCLC.In accordance with the developmental mode of lung cancer established by Sekine et al.,we assumed that the occurrence and development of lung cancer were linked not only to gene loss in the 3p region (WNT7A,3p25) and genetic mutations in the 9p region but also to similar events in the regions of 1p36.2 (FRAP1),6q25.2-q27 (PARK2),and 11q13 (CCND1).Lastly,the invasion or metastasis of lung cancer happened.展开更多
Domain-based protein-protein interactions( PPIs) is a problem that has drawn the attentions of many researchers in recent years and it has been studied using lots of computational approaches from many different perspe...Domain-based protein-protein interactions( PPIs) is a problem that has drawn the attentions of many researchers in recent years and it has been studied using lots of computational approaches from many different perspectives. Existing domain-based methods to predict PPIs typically infer domain interactions from known interacting sets of proteins. However,these methods are costly and complex to implement. In this paper, a simple and effective prediction model is proposed. In this model,an improved multiinstance learning( MIL) algorithm( MilCaA) is designed that doesn't need to take the domain interactions into consideration to construct MIL bags. Then, the pseudo-amino acid composition( PseAAC) transformation method is used to encode the instances in a multi-instance bag and the principal components analysis( PCA) is also used to reduce the feature dimension. Finally, several traditional machine learning and MIL methods are used to verify the proposed model. Experimental results demonstrate that MilCaA performs better than state-of-the-art techniques including the traditional machine learning methods which are widely used in PPIs prediction.展开更多
In this work, a hybrid method is proposed to eliminate the limitations of traditional protein-protein interactions (PPIs) extraction methods, such as pattern learning and machine learning. Each sentence from the bio...In this work, a hybrid method is proposed to eliminate the limitations of traditional protein-protein interactions (PPIs) extraction methods, such as pattern learning and machine learning. Each sentence from the biomedical literature containing a protein pair describes a PPI which is predicted by first learning syntax patterns typical of PPIs from training corpus and then using their presence as features, along with bag-of-word features in a maximum entropy model. Tested on the BioCreAtIve corpus, the PPIs extraction method, which achieved a precision rate of 64%, recall rate of 60%, improved the performance in terms of F1 value by 11% compared with the component pure pattern- based and bag-of-word methods. The results on this test set were also compared with other three extraction methods and found to improve the performance remarkably.展开更多
Exosomes exhibit complex biological functions and mediate a variety of biological processes,such as promoting axonal regeneration and functional recove ry after injury.Long non-coding RNAs(IncRNAs)have been reported t...Exosomes exhibit complex biological functions and mediate a variety of biological processes,such as promoting axonal regeneration and functional recove ry after injury.Long non-coding RNAs(IncRNAs)have been reported to play a crucial role in axonal regeneration.Howeve r,the role of the IncRNA-microRNAmessenger RNA(mRNA)-competitive endogenous RNA(ceRNA)network in exosome-mediated axonal regeneration remains unclear.In this study,we performed RNA transcriptome sequencing analysis to assess mRNA expression patterns in exosomes produced by cultured fibroblasts(FC-EXOs)and Schwann cells(SCEXOs).Diffe rential gene expression analysis,Gene Ontology analysis,Kyoto Encyclopedia of Genes and Genomes analysis,and protein-protein intera ction network analysis were used to explo re the functions and related pathways of RNAs isolated from FC-EXOs and SC-EXOs.We found that the ribosome-related central gene Rps5 was enriched in FC-EXOs and SC-EXOs,which suggests that it may promote axonal regeneration.In addition,using the miRWalk and Starbase prediction databases,we constructed a regulatory network of ceRNAs targeting Rps5,including 27 microRNAs and five IncRNAs.The ceRNA regulatory network,which included Ftx and Miat,revealed that exsosome-derived Rps5 inhibits scar formation and promotes axonal regeneration and functional recovery after nerve injury.Our findings suggest that exosomes derived from fibro blast and Schwann cells could be used to treat injuries of peripheral nervous system.展开更多
针对蚁群聚类在蛋白质相互作用(protein-protein interaction,PPI)网络中进行功能模块检测问题上时间性能的不足,提出一种快速的基于蚁群聚类的PPI网络功能模块检测(fast ant colony clustering for functional module detection,FACC-F...针对蚁群聚类在蛋白质相互作用(protein-protein interaction,PPI)网络中进行功能模块检测问题上时间性能的不足,提出一种快速的基于蚁群聚类的PPI网络功能模块检测(fast ant colony clustering for functional module detection,FACC-FMD)方法.该算法计算每个蛋白质与核心组蛋白质的相似度,根据拾起放下模型进行聚类,得到的初始聚类结果中功能模块之间相似度很小,省去了原始蚁群聚类算法中的合并和过滤操作,缩短了求解时间.同时该算法根据蛋白质的关键性对蚁群聚类中的拾起放下操作做了更严格的约束,以减少拾起放下的次数,加速了聚类的过程.在多个PPI网络上的实验表明:与原始蚁群聚类方法相比,FACC-FMD大幅度提高了时间性能,同时取得了良好的检测质量,而且与近年来的一些经典算法相比在多项性能指标上也具有一定的优势.展开更多
Essential proteins are inseparable in cell growth and survival. The study of essential proteins is important for understanding cellular functions and biological mechanisms. Therefore, various computable methods have b...Essential proteins are inseparable in cell growth and survival. The study of essential proteins is important for understanding cellular functions and biological mechanisms. Therefore, various computable methods have been proposed to identify essential proteins. Unfortunately, most methods based on network topology only consider the interactions between a protein and its neighboring proteins, and not the interactions with its higher-order distance proteins. In this paper, we propose the DSEP algorithm in which we integrated network topology properties and subcellular localization information in protein–protein interaction(PPI) networks based on four-order distances, and then used random walks to identify the essential proteins. We also propose a method to calculate the finite-order distance of the network, which can greatly reduce the time complexity of our algorithm. We conducted a comprehensive comparison of the DSEP algorithm with 11 existing classical algorithms to identify essential proteins with multiple evaluation methods. The results show that DSEP is superior to these 11 methods.展开更多
Protein-protein interactions are of great significance for human to understand the functional mechanisms of proteins.With the rapid development of high-throughput genomic technologies,massive protein-protein interacti...Protein-protein interactions are of great significance for human to understand the functional mechanisms of proteins.With the rapid development of high-throughput genomic technologies,massive protein-protein interaction(PPI)data have been generated,making it very difficult to analyze them efficiently.To address this problem,this paper presents a distributed framework by reimplementing one of state-of-the-art algorithms,i.e.,CoFex,using MapReduce.To do so,an in-depth analysis of its limitations is conducted from the perspectives of efficiency and memory consumption when applying it for large-scale PPI data analysis and prediction.Respective solutions are then devised to overcome these limitations.In particular,we adopt a novel tree-based data structure to reduce the heavy memory consumption caused by the huge sequence information of proteins.After that,its procedure is modified by following the MapReduce framework to take the prediction task distributively.A series of extensive experiments have been conducted to evaluate the performance of our framework in terms of both efficiency and accuracy.Experimental results well demonstrate that the proposed framework can considerably improve its computational efficiency by more than two orders of magnitude while retaining the same high accuracy.展开更多
Smoking is the primary cause of lung cancer and is linked to 85% of lung cancer cases.However,how lung cancer develops in patients with smoking history remains unclear.Systems approaches that combine human protein-pro...Smoking is the primary cause of lung cancer and is linked to 85% of lung cancer cases.However,how lung cancer develops in patients with smoking history remains unclear.Systems approaches that combine human protein-protein interaction (PPI) networks and gene expression data are superior to traditional methods.We performed these systems to determine the role that smoking plays in lung cancer development and used the support vector machine (SVM) model to predict PPIs.By defining expression variance (EV),we found 520 dynamic proteins (EV>0.4) using data from the Human Protein Reference Database and Gene Expression Omnibus Database,and built 7 dynamic PPI subnetworks of lung cancer in patients with smoking history.We also determined the primary functions of each subnetwork:signal transduction,apoptosis,and cell migration and adhesion for subnetwork A;cell-sustained angiogenesis for subnetwork B;apoptosis for subnetwork C;and,finally,signal transduction and cell replication and proliferation for subnetworks D-G.The probability distribution of the degree of dynamic protein and static protein differed,clearly showing that the dynamic proteins were not the core proteins which widely connected with their neighbor proteins.There were high correlations among the dynamic proteins,suggesting that the dynamic proteins tend to form specific dynamic modules.We also found that the dynamic proteins were only correlated with the expression of selected proteins but not all neighbor proteins when cancer occurred.展开更多
Alpha-synuclein plays an important role in Parkinson's disease(PD).The current study of alpha-synuclein mainly concentrates at the gene level.However, it is found that the study at the protein level has special si...Alpha-synuclein plays an important role in Parkinson's disease(PD).The current study of alpha-synuclein mainly concentrates at the gene level.However, it is found that the study at the protein level has special significance.Meanwhile, there is free information on the Internet, such as databases and algorithms of protein-protein interactions(PPIs).In this paper, a novel method which integrates distributed heterogeneous data sources and algorithms to predict PPIs for alpha-synuclein in silico is proposed.The PPIs generated by the method take advantage of various experimental data, and indicate new information about PPIs for alpha-synuclein.In the end of this paper, the result illustrates that the method is practical.It is hoped that the prediction result obtained by this method can provide guidance for biological experiments of PPIs for alpha-synuclein to reveal possible mechanisms of PD.展开更多
生物蛋白质相互作用网络,简称PPI网络,是一种生物信息学中用来表示蛋白质之间相互作用关系的图模型。不同物种PPI网络之间的比对,有着重要的生物学意义,一个好的PPI网络比对算法,显得尤为重要。针对该问题,首次提出了LOBM(Local Optimiz...生物蛋白质相互作用网络,简称PPI网络,是一种生物信息学中用来表示蛋白质之间相互作用关系的图模型。不同物种PPI网络之间的比对,有着重要的生物学意义,一个好的PPI网络比对算法,显得尤为重要。针对该问题,首次提出了LOBM(Local Optimization based on Bipartite graph Matching)算法。LOBM是一种能够局部优化既有比对结果,并且利用二分图匹配这一经典图论模型,来提高既有比对算法的比对效果。实验表明,LOBM相比一些现有的比对算法,在比对结果上有较大的提升。展开更多
基金supported by National Natural Science Foundation of China (No.91130009)Science and Technology Planning Project of Guangdong Province of China(No.2003A3080503)
文摘Systems biology has become an effective approach for understanding the molecular mechanisms underlying the development of lung cancer.In this study,sequences of 100 non-small cell lung cancer (NSCLC)-related proteins were downloaded from the National Center for Biotechnology Information (NCBI) databases.The Theory of Coevolution was then used to build a protein-protein interaction (PPI) network of NSCLC.Adopting the reverse thinking approach,we analyzed the NSCLC proteins one at a time.Fifteen key proteins were identified and categorized into a special protein family F(K),which included Cyclin D1 (CCND1),E-cadherin (CDH1),Cyclin-dependent kinase inhibitor 2A (CDKN2A),chemokine (C-X-C motif) ligand 12 (CXCL12),epidermal growth factor (EGF),epidermal growth factor receptor (EGFR),TNF receptor superfamily,member 6(FAS),FK506 binding protein 12-rapamycin associated protein 1 (FRAP1),O-6-methylguanine-DNA methyltransferase (MGMT),parkinson protein 2,E3 ubiquitin protein ligase (PARK2),phosphatase and tensin homolog (PTEN),calcium channel voltage-dependent alpha 2/delta subunit 2 (CACNA2D2),tubulin beta class I (TUBB),SWI/SNF-related,matrix-associated,actin-dependent regulator of chromatin,subfamily a,member 2 (SMARCA2),and wingless-type MMTV integration site family,member 7A (WNT7A).Seven key nodes of the sub-network were identified,which included PARK2,WNT7A,SMARCA2,FRAP1,CDKN2A,CCND1,and EGFR.The PPI predictions of EGFR-EGF,PARK2-FAS,PTEN-FAS,and CACNA2D2-CDH1 were confirmed experimentally by retrieving the Biological General Repository for Interaction Datasets (BioGRID) and PubMed databases.We proposed that the 7 proteins could serve as potential diagnostic molecular markers for NSCLC.In accordance with the developmental mode of lung cancer established by Sekine et al.,we assumed that the occurrence and development of lung cancer were linked not only to gene loss in the 3p region (WNT7A,3p25) and genetic mutations in the 9p region but also to similar events in the regions of 1p36.2 (FRAP1),6q25.2-q27 (PARK2),and 11q13 (CCND1).Lastly,the invasion or metastasis of lung cancer happened.
基金National Natural Science Foundations of China(Nos.61503116,61402007)Foundation for Young Talents in the Colleges of Anhui Province Committee,China(No.2013SQRL097ZD)+1 种基金Natural Science Foundation of Anhui Educational Committee,China(No.KJ2014A198)Natural Science Foundation of Anhui Province,China(No.1408085QF108)
文摘Domain-based protein-protein interactions( PPIs) is a problem that has drawn the attentions of many researchers in recent years and it has been studied using lots of computational approaches from many different perspectives. Existing domain-based methods to predict PPIs typically infer domain interactions from known interacting sets of proteins. However,these methods are costly and complex to implement. In this paper, a simple and effective prediction model is proposed. In this model,an improved multiinstance learning( MIL) algorithm( MilCaA) is designed that doesn't need to take the domain interactions into consideration to construct MIL bags. Then, the pseudo-amino acid composition( PseAAC) transformation method is used to encode the instances in a multi-instance bag and the principal components analysis( PCA) is also used to reduce the feature dimension. Finally, several traditional machine learning and MIL methods are used to verify the proposed model. Experimental results demonstrate that MilCaA performs better than state-of-the-art techniques including the traditional machine learning methods which are widely used in PPIs prediction.
文摘In this work, a hybrid method is proposed to eliminate the limitations of traditional protein-protein interactions (PPIs) extraction methods, such as pattern learning and machine learning. Each sentence from the biomedical literature containing a protein pair describes a PPI which is predicted by first learning syntax patterns typical of PPIs from training corpus and then using their presence as features, along with bag-of-word features in a maximum entropy model. Tested on the BioCreAtIve corpus, the PPIs extraction method, which achieved a precision rate of 64%, recall rate of 60%, improved the performance in terms of F1 value by 11% compared with the component pure pattern- based and bag-of-word methods. The results on this test set were also compared with other three extraction methods and found to improve the performance remarkably.
基金supported by the National Natural Science Foundation of China,No.81870975(to SZ)。
文摘Exosomes exhibit complex biological functions and mediate a variety of biological processes,such as promoting axonal regeneration and functional recove ry after injury.Long non-coding RNAs(IncRNAs)have been reported to play a crucial role in axonal regeneration.Howeve r,the role of the IncRNA-microRNAmessenger RNA(mRNA)-competitive endogenous RNA(ceRNA)network in exosome-mediated axonal regeneration remains unclear.In this study,we performed RNA transcriptome sequencing analysis to assess mRNA expression patterns in exosomes produced by cultured fibroblasts(FC-EXOs)and Schwann cells(SCEXOs).Diffe rential gene expression analysis,Gene Ontology analysis,Kyoto Encyclopedia of Genes and Genomes analysis,and protein-protein intera ction network analysis were used to explo re the functions and related pathways of RNAs isolated from FC-EXOs and SC-EXOs.We found that the ribosome-related central gene Rps5 was enriched in FC-EXOs and SC-EXOs,which suggests that it may promote axonal regeneration.In addition,using the miRWalk and Starbase prediction databases,we constructed a regulatory network of ceRNAs targeting Rps5,including 27 microRNAs and five IncRNAs.The ceRNA regulatory network,which included Ftx and Miat,revealed that exsosome-derived Rps5 inhibits scar formation and promotes axonal regeneration and functional recovery after nerve injury.Our findings suggest that exosomes derived from fibro blast and Schwann cells could be used to treat injuries of peripheral nervous system.
文摘针对蚁群聚类在蛋白质相互作用(protein-protein interaction,PPI)网络中进行功能模块检测问题上时间性能的不足,提出一种快速的基于蚁群聚类的PPI网络功能模块检测(fast ant colony clustering for functional module detection,FACC-FMD)方法.该算法计算每个蛋白质与核心组蛋白质的相似度,根据拾起放下模型进行聚类,得到的初始聚类结果中功能模块之间相似度很小,省去了原始蚁群聚类算法中的合并和过滤操作,缩短了求解时间.同时该算法根据蛋白质的关键性对蚁群聚类中的拾起放下操作做了更严格的约束,以减少拾起放下的次数,加速了聚类的过程.在多个PPI网络上的实验表明:与原始蚁群聚类方法相比,FACC-FMD大幅度提高了时间性能,同时取得了良好的检测质量,而且与近年来的一些经典算法相比在多项性能指标上也具有一定的优势.
基金Project supported by the Gansu Province Industrial Support Plan (Grant No.2023CYZC-25)the Natural Science Foundation of Gansu Province (Grant No.23JRRA770)the National Natural Science Foundation of China (Grant No.62162040)。
文摘Essential proteins are inseparable in cell growth and survival. The study of essential proteins is important for understanding cellular functions and biological mechanisms. Therefore, various computable methods have been proposed to identify essential proteins. Unfortunately, most methods based on network topology only consider the interactions between a protein and its neighboring proteins, and not the interactions with its higher-order distance proteins. In this paper, we propose the DSEP algorithm in which we integrated network topology properties and subcellular localization information in protein–protein interaction(PPI) networks based on four-order distances, and then used random walks to identify the essential proteins. We also propose a method to calculate the finite-order distance of the network, which can greatly reduce the time complexity of our algorithm. We conducted a comprehensive comparison of the DSEP algorithm with 11 existing classical algorithms to identify essential proteins with multiple evaluation methods. The results show that DSEP is superior to these 11 methods.
基金This work was supported in part by the National Natural Science Foundation of China(61772493)the CAAI-Huawei MindSpore Open Fund(CAAIXSJLJJ-2020-004B)+4 种基金the Natural Science Foundation of Chongqing(China)(cstc2019jcyjjqX0013)Chongqing Research Program of Technology Innovation and Application(cstc2019jscx-fxydX0024,cstc2019jscx-fxydX0027,cstc2018jszx-cyzdX0041)Guangdong Province Universities and College Pearl River Scholar Funded Scheme(2019)the Pioneer Hundred Talents Program of Chinese Academy of Sciencesthe Deanship of Scientific Research(DSR)at King Abdulaziz University(G-21-135-38).
文摘Protein-protein interactions are of great significance for human to understand the functional mechanisms of proteins.With the rapid development of high-throughput genomic technologies,massive protein-protein interaction(PPI)data have been generated,making it very difficult to analyze them efficiently.To address this problem,this paper presents a distributed framework by reimplementing one of state-of-the-art algorithms,i.e.,CoFex,using MapReduce.To do so,an in-depth analysis of its limitations is conducted from the perspectives of efficiency and memory consumption when applying it for large-scale PPI data analysis and prediction.Respective solutions are then devised to overcome these limitations.In particular,we adopt a novel tree-based data structure to reduce the heavy memory consumption caused by the huge sequence information of proteins.After that,its procedure is modified by following the MapReduce framework to take the prediction task distributively.A series of extensive experiments have been conducted to evaluate the performance of our framework in terms of both efficiency and accuracy.Experimental results well demonstrate that the proposed framework can considerably improve its computational efficiency by more than two orders of magnitude while retaining the same high accuracy.
基金supported by grants from the National Natural Science Foundation of China (No. 91130009)Science and Technology Planning Project of Guangdong Province of China (No. 2003A3080503)
文摘Smoking is the primary cause of lung cancer and is linked to 85% of lung cancer cases.However,how lung cancer develops in patients with smoking history remains unclear.Systems approaches that combine human protein-protein interaction (PPI) networks and gene expression data are superior to traditional methods.We performed these systems to determine the role that smoking plays in lung cancer development and used the support vector machine (SVM) model to predict PPIs.By defining expression variance (EV),we found 520 dynamic proteins (EV>0.4) using data from the Human Protein Reference Database and Gene Expression Omnibus Database,and built 7 dynamic PPI subnetworks of lung cancer in patients with smoking history.We also determined the primary functions of each subnetwork:signal transduction,apoptosis,and cell migration and adhesion for subnetwork A;cell-sustained angiogenesis for subnetwork B;apoptosis for subnetwork C;and,finally,signal transduction and cell replication and proliferation for subnetworks D-G.The probability distribution of the degree of dynamic protein and static protein differed,clearly showing that the dynamic proteins were not the core proteins which widely connected with their neighbor proteins.There were high correlations among the dynamic proteins,suggesting that the dynamic proteins tend to form specific dynamic modules.We also found that the dynamic proteins were only correlated with the expression of selected proteins but not all neighbor proteins when cancer occurred.
基金supported by the National Basic Research Program of China (Grant No.2006CB500702)the Shanghai Lead-ing Academic Discipline Project (Grant No.J50103)Shanghai University Systems Biology Reasearch Funding (GrantNo.SBR08001)
文摘Alpha-synuclein plays an important role in Parkinson's disease(PD).The current study of alpha-synuclein mainly concentrates at the gene level.However, it is found that the study at the protein level has special significance.Meanwhile, there is free information on the Internet, such as databases and algorithms of protein-protein interactions(PPIs).In this paper, a novel method which integrates distributed heterogeneous data sources and algorithms to predict PPIs for alpha-synuclein in silico is proposed.The PPIs generated by the method take advantage of various experimental data, and indicate new information about PPIs for alpha-synuclein.In the end of this paper, the result illustrates that the method is practical.It is hoped that the prediction result obtained by this method can provide guidance for biological experiments of PPIs for alpha-synuclein to reveal possible mechanisms of PD.
文摘生物蛋白质相互作用网络,简称PPI网络,是一种生物信息学中用来表示蛋白质之间相互作用关系的图模型。不同物种PPI网络之间的比对,有着重要的生物学意义,一个好的PPI网络比对算法,显得尤为重要。针对该问题,首次提出了LOBM(Local Optimization based on Bipartite graph Matching)算法。LOBM是一种能够局部优化既有比对结果,并且利用二分图匹配这一经典图论模型,来提高既有比对算法的比对效果。实验表明,LOBM相比一些现有的比对算法,在比对结果上有较大的提升。