Identifying associations between microRNAs(miRNAs)and diseases is very important to understand the occurrence and development of human diseases.However,these existing methods suffer from the following limitation:first...Identifying associations between microRNAs(miRNAs)and diseases is very important to understand the occurrence and development of human diseases.However,these existing methods suffer from the following limitation:first,some disease-related miRNAs are obtained from the miRNA functional similarity networks consisting of heterogeneous data sources,i.e.,disease similarity,protein interaction network,gene expression.Second,little approaches infer disease-related miRNAs depending on the network topological features without the functional similarity of miRNAs.In this paper,we develop a novel model of Integrating Network Topology Similarity and MicroRNA Function Similarity(INTS-MFS).The integrated miRNA similarities are calculated based on miRNA functional similarity and network topological characteristics.INTS-MFS obtained AUC of 0.872 based on five-fold cross-validation and was applied to three common human diseases in case studies.As a results,30 out of top 30 predicted Prostatic Neoplasm-related miRNAs were included in the two databases of dbDEMC and PhenomiR2.0.29 out of top 30 predicted Lung Neoplasm-related miRNAs and Breast Neoplasm-related miRNAs were included in dbDEMC,PhenomiR2.0 and experimental reports.Moreover,INTS-MFS found unknown association with hsa-mir-371a in breast cancer and lung cancer,which have not been reported.It provides biologists new clues for diagnosing breast and lung cancer.展开更多
Objective: To derive the Chinese medicine(CM) syndrome classification and subgroup syndrome characteristics of ischemic stroke patients. Methods: By extracting the CM clinical electronic medical records(EMRs) of 7,170...Objective: To derive the Chinese medicine(CM) syndrome classification and subgroup syndrome characteristics of ischemic stroke patients. Methods: By extracting the CM clinical electronic medical records(EMRs) of 7,170 hospitalized patients with ischemic stroke from 2016 to 2018 at Weifang Hospital of Traditional Chinese Medicine, Shandong Province, China, a patient similarity network(PSN) was constructed based on the symptomatic phenotype of the patients. Thereafter the efficient community detection method BGLL was used to identify subgroups of patients. Finally, subgroups with a large number of cases were selected to analyze the specific manifestations of clinical symptoms and CM syndromes in each subgroup. Results: Seven main subgroups of patients with specific symptom characteristics were identified, including M3, M2, M1, M5, M0, M29and M4. M3 and M0 subgroups had prominent posterior circulatory symptoms, while M3 was associated with autonomic disorders, and M4 manifested as anxiety;M2 and M4 had motor and motor coordination disorders;M1 had sensory disorders;M5 had more obvious lung infections;M29 had a disorder of consciousness. The specificity of CM syndromes of each subgroup was as follows. M3, M2, M1, M0, M29 and M4 all had the same syndrome as wind phlegm pattern;M3 and M0 both showed hyperactivity of Gan(Liver) yang pattern;M2 and M29 had similar syndromes, which corresponded to intertwined phlegm and blood stasis pattern and phlegm-stasis obstructing meridians pattern, respectively. The manifestations of CM syndromes often appeared in a combination of 2 or more syndrome elements. The most common combination of these 7 subgroups was wind-phlegm. The 7 subgroups of CM syndrome elements were specifically manifested as pathogenic wind, pathogenic phlegm,and deficiency pathogens. Conclusions: There were 7 main symptom similarity-based subgroups in ischemic stroke patients, and their specific characteristics were obvious. The main syndromes were wind phlegm pattern and hyperactivity of Gan yang pattern.展开更多
We have studied sharp peak landscapes of the Eigen model from a new perspective about how the quasispecies are distributed in the sequence space. To analyse the distribution more carefully, we bring in two tools. One ...We have studied sharp peak landscapes of the Eigen model from a new perspective about how the quasispecies are distributed in the sequence space. To analyse the distribution more carefully, we bring in two tools. One tool is the variance of Hamming distance of the sequences at a given generation. It not only offers us a different avenue for accurately locating the error threshold and illustrates how the configuration of the distribution varies with copying fidelity q in the sequence space, but also divides the copying fidelity into three distinct regimes. The other tool is the similarity network of a certain Hamming distance do, by which we can gain a visual and in-depth result about how the sequences axe distributed. We find that there are several local similarity optima around the centre (global similarity optimum) in the distribution of the sequences reproduced near the threshold. Furthermore, it is interesting that the distribution of clustering coefficient C(k) follows lognormal distribution and the curve of clustering coefficient C of the network versus do appears to be linear near the threshold.展开更多
Information networks that can be extracted from many domains are widely studied recently. Different functions for mining these networks are proposed and developed, such as ranking, community detection, and link predic...Information networks that can be extracted from many domains are widely studied recently. Different functions for mining these networks are proposed and developed, such as ranking, community detection, and link prediction. Most existing network studies are on homogeneous networks, where nodes and links are assumed from one single type. In reality, however, heterogeneous information networks can better model the real-world systems, which are typically semi-structured and typed, following a network schema. In order to mine these heterogeneous information networks directly, we propose to explore the meta structure of the information network, i.e., the network schema. The concepts of meta-paths are proposed to systematically capture numerous semantic relationships across multiple types of objects, which are defined as a path over the graph of network schema. Meta-paths can provide guidance for search and mining of the network and help analyze and understand the semantic meaning of the objects and relations in the network. Under this framework, similarity search and other mining tasks such as relationship prediction and clustering can be addressed by systematic exploration of the network meta structure. Moreover, with user's guidance or feedback, we can select the best meta-path or their weighted combination for a specific mining task.展开更多
The accumulation of various types of drug informatics data and computational approaches for drug repositioning can accelerate pharmaceutical research and development.However,the integration of multi-dimensional drug d...The accumulation of various types of drug informatics data and computational approaches for drug repositioning can accelerate pharmaceutical research and development.However,the integration of multi-dimensional drug data for precision repositioning remains a pressing challenge.Here,we propose a systematic framework named PIMD to predict drug therapeutic properties by integrating multi-dimensional data for drug repositioning.In PIMD,drug similarity networks(DSNs)based on chemical,pharmacological,and clinical data are fused into an integrated DSN(iDSN)composed of many clusters.Rather than simple fusion,PIMD offers a systematic way to annotate clusters.Unexpected drugs within clusters and drug pairs with a high iDSN similarity score are therefore identified to predict novel therapeutic uses.PIMD provides new insights into the universality,individuality,and complementarity of different drug properties by evaluating the contribution of each property data.To test the performance of PIMD,we use chemical,pharmacological,and clinical properties to generate an iDSN.Analyses of the contributions of each drug property indicate that this iDSN was driven by all data types and performs better than other DSNs.Within the top 20 recommended drug pairs,7 drugs have been reported to be repurposed.The source code for PIMD is available at https://github.com/Sepstar/PIMD/.展开更多
A network of 3719 tRNA gene sequences was constructed using simplest alignment. Its topology, degree distribution and clustering coefficient were studied. The behaviors of the network shift from fluctuated distributio...A network of 3719 tRNA gene sequences was constructed using simplest alignment. Its topology, degree distribution and clustering coefficient were studied. The behaviors of the network shift from fluctuated distribution to scale-free distribution when the similarity degree of the tRNA gene sequences increases. The tRNA gene sequences with the same anticodon identity are more self-organized than those with different anticodon identities and form local clusters in the network. Some vertices of the local cluster have a high connection with other local clusters, and the probable reason was given. Moreover, a network constructed by the same number of random tRNA sequences was used to make comparisons. The relationships between the properties of the tRNA similarity network and the characters of tRNA evolutionary history were discussed.展开更多
With the development of the social media and Internet, discovering latent information from massive information is becoming particularly relevant to improving user experience. Research efforts based on preferences and ...With the development of the social media and Internet, discovering latent information from massive information is becoming particularly relevant to improving user experience. Research efforts based on preferences and relationships between users have attracted more and more attention. Predictive problems, such as inferring friend relationship and co-author relationship between users have been explored. However, many such methods are based on analyzing either node features or the network structures separately, few have tried to tackle both of them at the same time. In this paper, in order to discover latent co-interests' relationship, we not only consider users' attributes but network information as well. In addition, we propose an Interest-based Factor Graph Model (I-FGM) to incorporate these factors. Experiments on two data sets (bookmarking and music network) demonstrate that this predictive method can achieve better results than the other three methods (ANN, NB, and SVM).展开更多
基金This work was supported in part by the National Natural Science Foundation of China under Grants 61873089,62032007the Key Project of the Education Department of Hunan Province under Grant 20A087the Innovation Platform Open Fund Project of Hunan Provincial Education Department under Grant 20K025.
文摘Identifying associations between microRNAs(miRNAs)and diseases is very important to understand the occurrence and development of human diseases.However,these existing methods suffer from the following limitation:first,some disease-related miRNAs are obtained from the miRNA functional similarity networks consisting of heterogeneous data sources,i.e.,disease similarity,protein interaction network,gene expression.Second,little approaches infer disease-related miRNAs depending on the network topological features without the functional similarity of miRNAs.In this paper,we develop a novel model of Integrating Network Topology Similarity and MicroRNA Function Similarity(INTS-MFS).The integrated miRNA similarities are calculated based on miRNA functional similarity and network topological characteristics.INTS-MFS obtained AUC of 0.872 based on five-fold cross-validation and was applied to three common human diseases in case studies.As a results,30 out of top 30 predicted Prostatic Neoplasm-related miRNAs were included in the two databases of dbDEMC and PhenomiR2.0.29 out of top 30 predicted Lung Neoplasm-related miRNAs and Breast Neoplasm-related miRNAs were included in dbDEMC,PhenomiR2.0 and experimental reports.Moreover,INTS-MFS found unknown association with hsa-mir-371a in breast cancer and lung cancer,which have not been reported.It provides biologists new clues for diagnosing breast and lung cancer.
基金Supported by the National Key Research and Development Program (No.2017YFC1703502 and 2017YFC1703506)。
文摘Objective: To derive the Chinese medicine(CM) syndrome classification and subgroup syndrome characteristics of ischemic stroke patients. Methods: By extracting the CM clinical electronic medical records(EMRs) of 7,170 hospitalized patients with ischemic stroke from 2016 to 2018 at Weifang Hospital of Traditional Chinese Medicine, Shandong Province, China, a patient similarity network(PSN) was constructed based on the symptomatic phenotype of the patients. Thereafter the efficient community detection method BGLL was used to identify subgroups of patients. Finally, subgroups with a large number of cases were selected to analyze the specific manifestations of clinical symptoms and CM syndromes in each subgroup. Results: Seven main subgroups of patients with specific symptom characteristics were identified, including M3, M2, M1, M5, M0, M29and M4. M3 and M0 subgroups had prominent posterior circulatory symptoms, while M3 was associated with autonomic disorders, and M4 manifested as anxiety;M2 and M4 had motor and motor coordination disorders;M1 had sensory disorders;M5 had more obvious lung infections;M29 had a disorder of consciousness. The specificity of CM syndromes of each subgroup was as follows. M3, M2, M1, M0, M29 and M4 all had the same syndrome as wind phlegm pattern;M3 and M0 both showed hyperactivity of Gan(Liver) yang pattern;M2 and M29 had similar syndromes, which corresponded to intertwined phlegm and blood stasis pattern and phlegm-stasis obstructing meridians pattern, respectively. The manifestations of CM syndromes often appeared in a combination of 2 or more syndrome elements. The most common combination of these 7 subgroups was wind-phlegm. The 7 subgroups of CM syndrome elements were specifically manifested as pathogenic wind, pathogenic phlegm,and deficiency pathogens. Conclusions: There were 7 main symptom similarity-based subgroups in ischemic stroke patients, and their specific characteristics were obvious. The main syndromes were wind phlegm pattern and hyperactivity of Gan yang pattern.
基金Project supported by the National Natural Science Foundation of China (Grant Nos 10105007 and 10334020).
文摘We have studied sharp peak landscapes of the Eigen model from a new perspective about how the quasispecies are distributed in the sequence space. To analyse the distribution more carefully, we bring in two tools. One tool is the variance of Hamming distance of the sequences at a given generation. It not only offers us a different avenue for accurately locating the error threshold and illustrates how the configuration of the distribution varies with copying fidelity q in the sequence space, but also divides the copying fidelity into three distinct regimes. The other tool is the similarity network of a certain Hamming distance do, by which we can gain a visual and in-depth result about how the sequences axe distributed. We find that there are several local similarity optima around the centre (global similarity optimum) in the distribution of the sequences reproduced near the threshold. Furthermore, it is interesting that the distribution of clustering coefficient C(k) follows lognormal distribution and the curve of clustering coefficient C of the network versus do appears to be linear near the threshold.
基金supported in part by the U.S.Army Research Laboratory under Cooperative Agreement No.W911NF-09-2-0053(NS-CTA),NSF ⅡS-0905215,CNS-09-31975MIAS,a DHS-IDS Center for Multimodal Information Access and Synthesis at UIUC
文摘Information networks that can be extracted from many domains are widely studied recently. Different functions for mining these networks are proposed and developed, such as ranking, community detection, and link prediction. Most existing network studies are on homogeneous networks, where nodes and links are assumed from one single type. In reality, however, heterogeneous information networks can better model the real-world systems, which are typically semi-structured and typed, following a network schema. In order to mine these heterogeneous information networks directly, we propose to explore the meta structure of the information network, i.e., the network schema. The concepts of meta-paths are proposed to systematically capture numerous semantic relationships across multiple types of objects, which are defined as a path over the graph of network schema. Meta-paths can provide guidance for search and mining of the network and help analyze and understand the semantic meaning of the objects and relations in the network. Under this framework, similarity search and other mining tasks such as relationship prediction and clustering can be addressed by systematic exploration of the network meta structure. Moreover, with user's guidance or feedback, we can select the best meta-path or their weighted combination for a specific mining task.
基金supported by the National Natural Science Foundation of China(Grant No.U1435222)the Program of International Sci-Tech Cooperation,China(Grant No.2014DFB30020)。
文摘The accumulation of various types of drug informatics data and computational approaches for drug repositioning can accelerate pharmaceutical research and development.However,the integration of multi-dimensional drug data for precision repositioning remains a pressing challenge.Here,we propose a systematic framework named PIMD to predict drug therapeutic properties by integrating multi-dimensional data for drug repositioning.In PIMD,drug similarity networks(DSNs)based on chemical,pharmacological,and clinical data are fused into an integrated DSN(iDSN)composed of many clusters.Rather than simple fusion,PIMD offers a systematic way to annotate clusters.Unexpected drugs within clusters and drug pairs with a high iDSN similarity score are therefore identified to predict novel therapeutic uses.PIMD provides new insights into the universality,individuality,and complementarity of different drug properties by evaluating the contribution of each property data.To test the performance of PIMD,we use chemical,pharmacological,and clinical properties to generate an iDSN.Analyses of the contributions of each drug property indicate that this iDSN was driven by all data types and performs better than other DSNs.Within the top 20 recommended drug pairs,7 drugs have been reported to be repurposed.The source code for PIMD is available at https://github.com/Sepstar/PIMD/.
基金the National Natural Science Foundation of China (Nos. 10105007, 10334020, 90103035,10574088)
文摘A network of 3719 tRNA gene sequences was constructed using simplest alignment. Its topology, degree distribution and clustering coefficient were studied. The behaviors of the network shift from fluctuated distribution to scale-free distribution when the similarity degree of the tRNA gene sequences increases. The tRNA gene sequences with the same anticodon identity are more self-organized than those with different anticodon identities and form local clusters in the network. Some vertices of the local cluster have a high connection with other local clusters, and the probable reason was given. Moreover, a network constructed by the same number of random tRNA sequences was used to make comparisons. The relationships between the properties of the tRNA similarity network and the characters of tRNA evolutionary history were discussed.
基金the National Natural Science Foundation of China (No. 61170192)the Natural Science Foundations of Municipality of Chongqing(No. CSTC2012JJB40012)
文摘With the development of the social media and Internet, discovering latent information from massive information is becoming particularly relevant to improving user experience. Research efforts based on preferences and relationships between users have attracted more and more attention. Predictive problems, such as inferring friend relationship and co-author relationship between users have been explored. However, many such methods are based on analyzing either node features or the network structures separately, few have tried to tackle both of them at the same time. In this paper, in order to discover latent co-interests' relationship, we not only consider users' attributes but network information as well. In addition, we propose an Interest-based Factor Graph Model (I-FGM) to incorporate these factors. Experiments on two data sets (bookmarking and music network) demonstrate that this predictive method can achieve better results than the other three methods (ANN, NB, and SVM).