A heterogeneous information network,which is composed of various types of nodes and edges,has a complex structure and rich information content,and is widely used in social networks,academic networks,e-commerce,and oth...A heterogeneous information network,which is composed of various types of nodes and edges,has a complex structure and rich information content,and is widely used in social networks,academic networks,e-commerce,and other fields.Link prediction,as a key task to reveal the unobserved relationships in the network,is of great significance in heterogeneous information networks.This paper reviews the application of presentation-based learning methods in link prediction of heterogeneous information networks.This paper introduces the basic concepts of heterogeneous information networks,and the theoretical basis of representation learning,and discusses the specific application of the deep learning model in node embedding learning and link prediction in detail.The effectiveness and superiority of these methods on multiple real data sets are demonstrated by experimental verification.展开更多
Heterogeneous information network(HIN)has recently been widely adopted to describe complex graph structure in recommendation systems,proving its effectiveness in modeling complex graph data.Although existing HIN-based...Heterogeneous information network(HIN)has recently been widely adopted to describe complex graph structure in recommendation systems,proving its effectiveness in modeling complex graph data.Although existing HIN-based recommendation studies have achieved great success by performing message propagation between connected nodes on the defined metapaths,they have the following major limitations.Existing works mainly convert heterogeneous graphs into homogeneous graphs via defining metapaths,which are not expressive enough to capture more complicated dependency relationships involved on the metapath.Besides,the heterogeneous information is more likely to be provided by item attributes while social relations between users are not adequately considered.To tackle these limitations,we propose a novel social recommendation model MPISR,which models MetaPath Interaction for Social Recommendation on heterogeneous information network.Specifically,our model first learns the initial node representation through a pretraining module,and then identifies potential social friends and item relations based on their similarity to construct a unified HIN.We then develop the two-way encoder module with similarity encoder and instance encoder to capture the similarity collaborative signals and relational dependency on different metapaths.Extensive experiments on five real datasets demonstrate the effectiveness of our method.展开更多
Heterogeneous information networks(HINs)have been extensively applied to real-world tasks,such as recommendation systems,social networks,and citation networks.While existing HIN representation learning methods can eff...Heterogeneous information networks(HINs)have been extensively applied to real-world tasks,such as recommendation systems,social networks,and citation networks.While existing HIN representation learning methods can effectively learn the semantic and structural features in the network,little awareness was given to the distribution discrepancy of subgraphs within a single HIN.However,we find that ignoring such distribution discrepancy among subgraphs from multiple sources would hinder the effectiveness of graph embedding learning algorithms.This motivates us to propose SUMSHINE(Scalable Unsupervised Multi-Source Heterogeneous Information Network Embedding)-a scalable unsupervised framework to align the embedding distributions among multiple sources of an HiN.Experimental results on real-world datasets in a variety of downstream tasks validate the performance of our method over the state-of-the-art heterogeneous information network embedding algorithms.展开更多
Real-world complex networks are inherently heterogeneous;they have different types of nodes,attributes,and relationships.In recent years,various methods have been proposed to automatically learn how to encode the stru...Real-world complex networks are inherently heterogeneous;they have different types of nodes,attributes,and relationships.In recent years,various methods have been proposed to automatically learn how to encode the structural and semantic information contained in heterogeneous information networks(HINs)into low-dimensional embeddings;this task is called heterogeneous network embedding(HNE).Efficient HNE techniques can benefit various HIN-based machine learning tasks such as node classification,recommender systems,and information retrieval.Here,we provide a comprehensive survey of key advancements in the area of HNE.First,we define an encoder-decoder-based HNE model taxonomy.Then,we systematically overview,compare,and summarize various state-of-the-art HNE models and analyze the advantages and disadvantages of various model categories to identify more potentially competitive HNE frameworks.We also summarize the application fields,benchmark datasets,open source tools,andperformance evaluation in theHNEarea.Finally,wediscuss open issues and suggest promising future directions.We anticipate that this survey will provide deep insights into research in the field of HNE.展开更多
Information networks that can be extracted from many domains are widely studied recently. Different functions for mining these networks are proposed and developed, such as ranking, community detection, and link predic...Information networks that can be extracted from many domains are widely studied recently. Different functions for mining these networks are proposed and developed, such as ranking, community detection, and link prediction. Most existing network studies are on homogeneous networks, where nodes and links are assumed from one single type. In reality, however, heterogeneous information networks can better model the real-world systems, which are typically semi-structured and typed, following a network schema. In order to mine these heterogeneous information networks directly, we propose to explore the meta structure of the information network, i.e., the network schema. The concepts of meta-paths are proposed to systematically capture numerous semantic relationships across multiple types of objects, which are defined as a path over the graph of network schema. Meta-paths can provide guidance for search and mining of the network and help analyze and understand the semantic meaning of the objects and relations in the network. Under this framework, similarity search and other mining tasks such as relationship prediction and clustering can be addressed by systematic exploration of the network meta structure. Moreover, with user's guidance or feedback, we can select the best meta-path or their weighted combination for a specific mining task.展开更多
Entity set expansion(ESE)aims to expand an entity seed set to obtain more entities which have common properties.ESE is important for many applications such as dictionary con-struction and query suggestion.Traditional ...Entity set expansion(ESE)aims to expand an entity seed set to obtain more entities which have common properties.ESE is important for many applications such as dictionary con-struction and query suggestion.Traditional ESE methods relied heavily on the text and Web information of entities.Recently,some ESE methods employed knowledge graphs(KGs)to extend entities.However,they failed to effectively and fficiently utilize the rich semantics contained in a KG and ignored the text information of entities in Wikipedia.In this paper,we model a KG as a heterogeneous information network(HIN)containing multiple types of objects and relations.Fine-grained multi-type meta paths are proposed to capture the hidden relation among seed entities in a KG and thus to retrieve candidate entities.Then we rank the entities according to the meta path based structural similarity.Furthermore,to utilize the text description of entities in Wikipedia,we propose an extended model CoMeSE++which combines both structural information revealed by a KG and text information in Wikipedia for ESE.Extensive experiments on real-world datasets demonstrate that our model achieves better performance by combining structural and textual information of entities.展开更多
Heterogeneous Information Networks(HINs)contain multiple types of nodes and edges;therefore,they can preserve the semantic information and structure information.Cluster analysis using an HIN has obvious advantages ove...Heterogeneous Information Networks(HINs)contain multiple types of nodes and edges;therefore,they can preserve the semantic information and structure information.Cluster analysis using an HIN has obvious advantages over a transformation into a homogenous information network,which can promote the clustering results of different types of nodes.In our study,we applied a Nonnegative Matrix Tri-Factorization(NMTF)in a cluster analysis of multiple metapaths in HIN.Unlike the parameter estimation method of the probability distribution in previous studies,NMTF can obtain several dependent latent variables simultaneously,and each latent variable in NMTF is associated with the cluster of the corresponding node in the HIN.The method is suited to co-clustering leveraging multiple metapaths in HIN,because NMTF is employed for multiple nonnegative matrix factorizations simultaneously in our study.Experimental results on the real dataset show that the validity and correctness of our method,and the clustering result are better than that of the existing similar clustering algorithm.展开更多
Graph convolutional networks(GCNs)have been developed as a general and powerful tool to handle various tasks related to graph data.However,current methods mainly consider homogeneous networks and ignore the rich seman...Graph convolutional networks(GCNs)have been developed as a general and powerful tool to handle various tasks related to graph data.However,current methods mainly consider homogeneous networks and ignore the rich semantics and multiple types of objects that are common in heterogeneous information networks(HINs).In this paper,we present a Heterogeneous Hyperedge Convolutional Network(HHCN),a novel graph convolutional network architecture that operates on HINs.Specifically,we extract the rich semantics by different metastructures and adopt hyperedge to model the interactions among metastructure-based neighbors.Due to the powerful information extraction capabilities of metastructure and hyperedge,HHCN has the flexibility to model the complex relationships in HINs by setting different combinations of metastructures and hyperedges.Moreover,a metastructure attention layer is also designed to allow each node to select the metastructures based on their importance and provide potential interpretability for graph analysis.As a result,HHCN can encode node features,metastructure-based semantics and hyperedge information simultaneously by aggregating features from metastructure-based neighbors in a hierarchical manner.We evaluate HHCN by applying it to the semi-supervised node classification task.Experimental results show that HHCN outperforms state-of-the-art graph embedding models and recently proposed graph convolutional network models.展开更多
The information on host-microbe interactions contained in the operational taxonomic unit(OTU)abundance table can serve as a clue to understanding the biological traits of OTUs and samples.Some studies have inferred th...The information on host-microbe interactions contained in the operational taxonomic unit(OTU)abundance table can serve as a clue to understanding the biological traits of OTUs and samples.Some studies have inferred the taxonomies or functions of OTUs by constructing co-occurrence networks,but co-occurrence networks can only encompass a small fraction of all OTUs due to the high sparsity of the OTU table.There is a lack of studies that intensively explore and use the information on sample-OTU interactions.This study constructed a sample-OTU heterogeneous information network and represented the nodes in the network through the heterogeneous graph embedding method to form the OTU space and sample space.Taking advantage of the represented OTU and sample vectors combined with the original OTU abundance information,an Integrated Model of Embedded Taxonomies and Abundance(IMETA)was proposed for predicting sample attributes,such as phenotypes and individual diet habits.Both the OTU space and sample space contain reasonable biological or medical semantic information,and the IMETA using embedded OTU and sample vectors can have stable and good performance in the sample classification tasks.This suggests that the embedding representation based on the sample-OTU heterogeneous information network can provide more useful information for understanding microbiome samples.This study conducted quantified representations of the biological characteristics within the OTUs and samples,which is a good attempt to increase the utilization rate of information in the OTU abundance table,and it promotes a deeper understanding of the underlying knowledge of human microbiome.展开更多
Predicting interactions between drugs and target proteins has become an essential task in the drug discovery process.Although the method of validation via wet-lab experiments has become available,experimental methods ...Predicting interactions between drugs and target proteins has become an essential task in the drug discovery process.Although the method of validation via wet-lab experiments has become available,experimental methods for drug-target interaction(DTI)identification remain either time consuming or heavily dependent on domain expertise.Therefore,various computational models have been proposed to predict possible interactions between drugs and target proteins.However,most prediction methods do not consider the topological structures characteristics of the relationship.In this paper,we propose a relational topologybased heterogeneous network embedding method to predict drug-target interactions,abbreviated as RTHNE_DTI.We first construct a heterogeneous information network based on the interaction between different types of nodes,to enhance the ability of association discovery by fully considering the topology of the network.Then drug and target protein nodes can be represented by the other types of nodes.According to the different topological structure of the relationship between the nodes,we divide the relationship in the heterogeneous network into two categories and model them separately.Extensive experiments on the realworld drug datasets,RTHNE_DTI produces high efficiency and outperforms other state-of-the-art methods.RTHNE_DTI can be further used to predict the interaction between unknown interaction drug-target pairs.展开更多
Potential behavior prediction involves understanding the latent human behavior of specific groups,and can assist organizations in making strategic decisions.Progress in information technology has made it possible to a...Potential behavior prediction involves understanding the latent human behavior of specific groups,and can assist organizations in making strategic decisions.Progress in information technology has made it possible to acquire more and more data about human behavior.In this paper,we examine behavior data obtained in realworld scenarios as an information network composed of two types of objects(humans and actions)associated with various attributes and three types of relationships(human-human,human-action,and action-action),which we call the heterogeneous behavior network(HBN).To exploit the abundance and heterogeneity of the HBN,we propose a novel network embedding method,human-action-attribute-aware heterogeneous network embedding(a4 HNE),which jointly considers structural proximity,attribute resemblance,and heterogeneity fusion.Experiments on two real-world datasets show that this approach outperforms other similar methods on various heterogeneous information network mining tasks for potential behavior prediction.展开更多
As a powerful tool for elucidating the embedding representation of graph-structured data,Graph Neural Networks(GNNs),which are a series of powerful tools built on homogeneous networks,have been widely used in various ...As a powerful tool for elucidating the embedding representation of graph-structured data,Graph Neural Networks(GNNs),which are a series of powerful tools built on homogeneous networks,have been widely used in various data mining tasks.It is a huge challenge to apply a GNN to an embedding Heterogeneous Information Network(HIN).The main reason for this challenge is that HINs contain many different types of nodes and different types of relationships between nodes.HIN contains rich semantic and structural information,which requires a specially designed graph neural network.However,the existing HIN-based graph neural network models rarely consider the interactive information hidden between the meta-paths of HIN in the poor embedding of nodes in the HIN.In this paper,we propose an Attention-aware Heterogeneous graph Neural Network(AHNN)model to effectively extract useful information from HIN and use it to learn the embedding representation of nodes.Specifically,we first use node-level attention to aggregate and update the embedding representation of nodes,and then concatenate the embedding representation of the nodes on different meta-paths.Finally,the semantic-level neural network is proposed to extract the feature interaction relationships on different meta-paths and learn the final embedding of nodes.Experimental results on three widely used datasets showed that the AHNN model could significantly outperform the state-of-the-art models.展开更多
With the wide application of location-based social networks(LBSNs),personalized point of interest(POI)recommendation becomes popular,especially in the commercial field.Unfortunately,it is challenging to accurately rec...With the wide application of location-based social networks(LBSNs),personalized point of interest(POI)recommendation becomes popular,especially in the commercial field.Unfortunately,it is challenging to accurately recommend POIs to users because the user-POI matrix is extremely sparse.In addition,a user's check-in activities are affected by many influential factors.However,most of existing studies capture only few influential factors.It is hard for them to be extended to incorporate other heterogeneous information in a unified way.To address these problems,we propose a meta-path-based deep representation learning(MPDRL)model for personalized POI recommendation.In this model,we design eight types of meta-paths to fully utilize the rich heterogeneous information in LBSNs for the representations of users and POIs,and deeply mine the correlations between users and POIs.To further improve the recommendation performance,we design an attention-based long short-term memory(LSTM)network to learn the importance of different influential factors on a user's specific check-in activity.To verify the effectiveness of our proposed method,we conduct extensive experiments on a real-world dataset,Foursquare.Experimental results show that the MPDRL model improves at least 16.97%and 23.55%over all comparison methods in terms of the metric Precision@N(Pre@N)and Recall@N(Rec@N)respectively.展开更多
Scene-based recommendation has proven its usefulness in E-commerce,by recommending commodities based on a given scene.However,scenes are typically unknown in advance,which necessitates scene discovery for E-commerce.I...Scene-based recommendation has proven its usefulness in E-commerce,by recommending commodities based on a given scene.However,scenes are typically unknown in advance,which necessitates scene discovery for E-commerce.In this article,we study scene discovery for E-commerce systems.We first formalize a scene as a set of commodity cate-gories that occur simultaneously and frequently in real-world situations,and model an E-commerce platform as a heteroge-neous information network(HIN),whose nodes and links represent different types of objects and different types of rela-tionships between objects,respectively.We then formulate the scene mining problem for E-commerce as an unsupervised learning problem that finds the overlapping clusters of commodity categories in the HIN.To solve the problem,we pro-pose a non-negative matrix factorization based method SMEC(Scene Mining for E-Commerce),and theoretically prove its convergence.Using six real-world E-commerce datasets,we finally conduct an extensive experimental study to evaluate SMEC against 13 other methods,and show that SMEC consistently outperforms its competitors with regard to various evaluation measures.展开更多
基金Science and Technology Research Project of Jiangxi Provincial Department of Education(Project No.GJJ211348,GJJ211347 and GJJ2201056)。
文摘A heterogeneous information network,which is composed of various types of nodes and edges,has a complex structure and rich information content,and is widely used in social networks,academic networks,e-commerce,and other fields.Link prediction,as a key task to reveal the unobserved relationships in the network,is of great significance in heterogeneous information networks.This paper reviews the application of presentation-based learning methods in link prediction of heterogeneous information networks.This paper introduces the basic concepts of heterogeneous information networks,and the theoretical basis of representation learning,and discusses the specific application of the deep learning model in node embedding learning and link prediction in detail.The effectiveness and superiority of these methods on multiple real data sets are demonstrated by experimental verification.
基金supported by the National Natural Science Foundation of China(Grant Nos.61762078,62276073,61966009 and U22A2099)the Industrial Support Project of Gansu Colleges(No.2022CYZC11)+3 种基金the Natural Science Foundation of Gansu Province(21JR7RA114)the Northwest Normal University Young Teachers Research Capacity Promotion Plan(NWNU-LKQN2019-2)the Industrial Support Project of Gansu Colleges(No.2022CYZC11)the Northwest Normal University Post-graduate Research Funding Project(2021KYZZ02107).
文摘Heterogeneous information network(HIN)has recently been widely adopted to describe complex graph structure in recommendation systems,proving its effectiveness in modeling complex graph data.Although existing HIN-based recommendation studies have achieved great success by performing message propagation between connected nodes on the defined metapaths,they have the following major limitations.Existing works mainly convert heterogeneous graphs into homogeneous graphs via defining metapaths,which are not expressive enough to capture more complicated dependency relationships involved on the metapath.Besides,the heterogeneous information is more likely to be provided by item attributes while social relations between users are not adequately considered.To tackle these limitations,we propose a novel social recommendation model MPISR,which models MetaPath Interaction for Social Recommendation on heterogeneous information network.Specifically,our model first learns the initial node representation through a pretraining module,and then identifies potential social friends and item relations based on their similarity to construct a unified HIN.We then develop the two-way encoder module with similarity encoder and instance encoder to capture the similarity collaborative signals and relational dependency on different metapaths.Extensive experiments on five real datasets demonstrate the effectiveness of our method.
基金supported by the Research Grants Council of Hong Kong(17308321)the HKUTCL Joint Research Center for Artificial Intelligence sponsored by TCL Corporate Research(Hong Kong).
文摘Heterogeneous information networks(HINs)have been extensively applied to real-world tasks,such as recommendation systems,social networks,and citation networks.While existing HIN representation learning methods can effectively learn the semantic and structural features in the network,little awareness was given to the distribution discrepancy of subgraphs within a single HIN.However,we find that ignoring such distribution discrepancy among subgraphs from multiple sources would hinder the effectiveness of graph embedding learning algorithms.This motivates us to propose SUMSHINE(Scalable Unsupervised Multi-Source Heterogeneous Information Network Embedding)-a scalable unsupervised framework to align the embedding distributions among multiple sources of an HiN.Experimental results on real-world datasets in a variety of downstream tasks validate the performance of our method over the state-of-the-art heterogeneous information network embedding algorithms.
基金supported by the National Key Research and Development Plan of China(2017YFB0503700,2016YFB0501801)the National Natural Science Foundation of China(61170026,62173157)+1 种基金the Thirteen Five-Year Research Planning Project of National Language Committee(No.YB135-149)the Fundamental Research Funds for the Central Universities(Nos.CCNU20QN022,CCNU20QN021,CCNU20ZT012).
文摘Real-world complex networks are inherently heterogeneous;they have different types of nodes,attributes,and relationships.In recent years,various methods have been proposed to automatically learn how to encode the structural and semantic information contained in heterogeneous information networks(HINs)into low-dimensional embeddings;this task is called heterogeneous network embedding(HNE).Efficient HNE techniques can benefit various HIN-based machine learning tasks such as node classification,recommender systems,and information retrieval.Here,we provide a comprehensive survey of key advancements in the area of HNE.First,we define an encoder-decoder-based HNE model taxonomy.Then,we systematically overview,compare,and summarize various state-of-the-art HNE models and analyze the advantages and disadvantages of various model categories to identify more potentially competitive HNE frameworks.We also summarize the application fields,benchmark datasets,open source tools,andperformance evaluation in theHNEarea.Finally,wediscuss open issues and suggest promising future directions.We anticipate that this survey will provide deep insights into research in the field of HNE.
基金supported in part by the U.S.Army Research Laboratory under Cooperative Agreement No.W911NF-09-2-0053(NS-CTA),NSF ⅡS-0905215,CNS-09-31975MIAS,a DHS-IDS Center for Multimodal Information Access and Synthesis at UIUC
文摘Information networks that can be extracted from many domains are widely studied recently. Different functions for mining these networks are proposed and developed, such as ranking, community detection, and link prediction. Most existing network studies are on homogeneous networks, where nodes and links are assumed from one single type. In reality, however, heterogeneous information networks can better model the real-world systems, which are typically semi-structured and typed, following a network schema. In order to mine these heterogeneous information networks directly, we propose to explore the meta structure of the information network, i.e., the network schema. The concepts of meta-paths are proposed to systematically capture numerous semantic relationships across multiple types of objects, which are defined as a path over the graph of network schema. Meta-paths can provide guidance for search and mining of the network and help analyze and understand the semantic meaning of the objects and relations in the network. Under this framework, similarity search and other mining tasks such as relationship prediction and clustering can be addressed by systematic exploration of the network meta structure. Moreover, with user's guidance or feedback, we can select the best meta-path or their weighted combination for a specific mining task.
基金This work was supported by the National Natural Science Foundation of China(Grant Nos.61806020,61772082,61972047,61702296)the National Key Research and Development Program of China(2017YFB0803304)+1 种基金the Beijing Municipal Natural Science Foundation(4182043)the CCF-Tencent Open Fund,and the Fundamental Research Funds for the Central Universities.
文摘Entity set expansion(ESE)aims to expand an entity seed set to obtain more entities which have common properties.ESE is important for many applications such as dictionary con-struction and query suggestion.Traditional ESE methods relied heavily on the text and Web information of entities.Recently,some ESE methods employed knowledge graphs(KGs)to extend entities.However,they failed to effectively and fficiently utilize the rich semantics contained in a KG and ignored the text information of entities in Wikipedia.In this paper,we model a KG as a heterogeneous information network(HIN)containing multiple types of objects and relations.Fine-grained multi-type meta paths are proposed to capture the hidden relation among seed entities in a KG and thus to retrieve candidate entities.Then we rank the entities according to the meta path based structural similarity.Furthermore,to utilize the text description of entities in Wikipedia,we propose an extended model CoMeSE++which combines both structural information revealed by a KG and text information in Wikipedia for ESE.Extensive experiments on real-world datasets demonstrate that our model achieves better performance by combining structural and textual information of entities.
基金supported in part by the National Natural Science Foundation of China(No.61701190)the Youth Science Foundation of Jilin Province of China(No.20180520021JH)+4 种基金the National Key Research and Development Plan of China(No.2017YFA0604500)the Key Scientific and Technological Research and Development Plan of Jilin Province of China(No.20180201103GX)the China Postdoctoral Science Foundation(No.2018M631873)the Project of Jilin Province Development and Reform Commission(No.2019FGWTZC001)the Key Technology Innovation Cooperation Project of Government and University for the Whole Industry Demonstration(No.SXGJSF2017-4)。
文摘Heterogeneous Information Networks(HINs)contain multiple types of nodes and edges;therefore,they can preserve the semantic information and structure information.Cluster analysis using an HIN has obvious advantages over a transformation into a homogenous information network,which can promote the clustering results of different types of nodes.In our study,we applied a Nonnegative Matrix Tri-Factorization(NMTF)in a cluster analysis of multiple metapaths in HIN.Unlike the parameter estimation method of the probability distribution in previous studies,NMTF can obtain several dependent latent variables simultaneously,and each latent variable in NMTF is associated with the cluster of the corresponding node in the HIN.The method is suited to co-clustering leveraging multiple metapaths in HIN,because NMTF is employed for multiple nonnegative matrix factorizations simultaneously in our study.Experimental results on the real dataset show that the validity and correctness of our method,and the clustering result are better than that of the existing similar clustering algorithm.
基金funded by The Science and Technology Strengthening Police Basic Program of Ministry of Public Security(2018GABJC03)The Technology Research Project Program of Ministry of Public Security(2018JSYJA02).
文摘Graph convolutional networks(GCNs)have been developed as a general and powerful tool to handle various tasks related to graph data.However,current methods mainly consider homogeneous networks and ignore the rich semantics and multiple types of objects that are common in heterogeneous information networks(HINs).In this paper,we present a Heterogeneous Hyperedge Convolutional Network(HHCN),a novel graph convolutional network architecture that operates on HINs.Specifically,we extract the rich semantics by different metastructures and adopt hyperedge to model the interactions among metastructure-based neighbors.Due to the powerful information extraction capabilities of metastructure and hyperedge,HHCN has the flexibility to model the complex relationships in HINs by setting different combinations of metastructures and hyperedges.Moreover,a metastructure attention layer is also designed to allow each node to select the metastructures based on their importance and provide potential interpretability for graph analysis.As a result,HHCN can encode node features,metastructure-based semantics and hyperedge information simultaneously by aggregating features from metastructure-based neighbors in a hierarchical manner.We evaluate HHCN by applying it to the semi-supervised node classification task.Experimental results show that HHCN outperforms state-of-the-art graph embedding models and recently proposed graph convolutional network models.
基金National Key R&DProgram,Grant/Award Number:2021YFF1200900National Natural Science Foundation of China,Grant/Award Number:72101029Beijing Natural Science Foundation Proposed Program,Grant/Award Number:4204104。
文摘The information on host-microbe interactions contained in the operational taxonomic unit(OTU)abundance table can serve as a clue to understanding the biological traits of OTUs and samples.Some studies have inferred the taxonomies or functions of OTUs by constructing co-occurrence networks,but co-occurrence networks can only encompass a small fraction of all OTUs due to the high sparsity of the OTU table.There is a lack of studies that intensively explore and use the information on sample-OTU interactions.This study constructed a sample-OTU heterogeneous information network and represented the nodes in the network through the heterogeneous graph embedding method to form the OTU space and sample space.Taking advantage of the represented OTU and sample vectors combined with the original OTU abundance information,an Integrated Model of Embedded Taxonomies and Abundance(IMETA)was proposed for predicting sample attributes,such as phenotypes and individual diet habits.Both the OTU space and sample space contain reasonable biological or medical semantic information,and the IMETA using embedded OTU and sample vectors can have stable and good performance in the sample classification tasks.This suggests that the embedding representation based on the sample-OTU heterogeneous information network can provide more useful information for understanding microbiome samples.This study conducted quantified representations of the biological characteristics within the OTUs and samples,which is a good attempt to increase the utilization rate of information in the OTU abundance table,and it promotes a deeper understanding of the underlying knowledge of human microbiome.
基金funded by the National Natural Science Foundation of China,grant number 61402220the key program of Scientific Research Fund of Hunan Provincial Education Department,grant number 19A439the Project supported by the Natural Science Foundation of Hunan Province,China,grant number 2020J4525 and grant number 2022J30495.
文摘Predicting interactions between drugs and target proteins has become an essential task in the drug discovery process.Although the method of validation via wet-lab experiments has become available,experimental methods for drug-target interaction(DTI)identification remain either time consuming or heavily dependent on domain expertise.Therefore,various computational models have been proposed to predict possible interactions between drugs and target proteins.However,most prediction methods do not consider the topological structures characteristics of the relationship.In this paper,we propose a relational topologybased heterogeneous network embedding method to predict drug-target interactions,abbreviated as RTHNE_DTI.We first construct a heterogeneous information network based on the interaction between different types of nodes,to enhance the ability of association discovery by fully considering the topology of the network.Then drug and target protein nodes can be represented by the other types of nodes.According to the different topological structure of the relationship between the nodes,we divide the relationship in the heterogeneous network into two categories and model them separately.Extensive experiments on the realworld drug datasets,RTHNE_DTI produces high efficiency and outperforms other state-of-the-art methods.RTHNE_DTI can be further used to predict the interaction between unknown interaction drug-target pairs.
基金Project supported by the National Natural Science Foundation of China(Nos.U1509206,61625107,and U1611461)the Key Program of Zhejiang Province,China(No.2015C01027).
文摘Potential behavior prediction involves understanding the latent human behavior of specific groups,and can assist organizations in making strategic decisions.Progress in information technology has made it possible to acquire more and more data about human behavior.In this paper,we examine behavior data obtained in realworld scenarios as an information network composed of two types of objects(humans and actions)associated with various attributes and three types of relationships(human-human,human-action,and action-action),which we call the heterogeneous behavior network(HBN).To exploit the abundance and heterogeneity of the HBN,we propose a novel network embedding method,human-action-attribute-aware heterogeneous network embedding(a4 HNE),which jointly considers structural proximity,attribute resemblance,and heterogeneity fusion.Experiments on two real-world datasets show that this approach outperforms other similar methods on various heterogeneous information network mining tasks for potential behavior prediction.
基金supported by the Key Scientific Guiding Project for the Central Universities Research Funds(No.N2008005)the Major Science and Technology Project of Liaoning Province of China(No.2020JH1/10100008)the National Key Research and Development Program of China(No.2018YFB1701104)。
文摘As a powerful tool for elucidating the embedding representation of graph-structured data,Graph Neural Networks(GNNs),which are a series of powerful tools built on homogeneous networks,have been widely used in various data mining tasks.It is a huge challenge to apply a GNN to an embedding Heterogeneous Information Network(HIN).The main reason for this challenge is that HINs contain many different types of nodes and different types of relationships between nodes.HIN contains rich semantic and structural information,which requires a specially designed graph neural network.However,the existing HIN-based graph neural network models rarely consider the interactive information hidden between the meta-paths of HIN in the poor embedding of nodes in the HIN.In this paper,we propose an Attention-aware Heterogeneous graph Neural Network(AHNN)model to effectively extract useful information from HIN and use it to learn the embedding representation of nodes.Specifically,we first use node-level attention to aggregate and update the embedding representation of nodes,and then concatenate the embedding representation of the nodes on different meta-paths.Finally,the semantic-level neural network is proposed to extract the feature interaction relationships on different meta-paths and learn the final embedding of nodes.Experimental results on three widely used datasets showed that the AHNN model could significantly outperform the state-of-the-art models.
基金National Natural Science Foundation of China(No.61972080)Shanghai Rising-Star Program,China(No.19QA1400300)。
文摘With the wide application of location-based social networks(LBSNs),personalized point of interest(POI)recommendation becomes popular,especially in the commercial field.Unfortunately,it is challenging to accurately recommend POIs to users because the user-POI matrix is extremely sparse.In addition,a user's check-in activities are affected by many influential factors.However,most of existing studies capture only few influential factors.It is hard for them to be extended to incorporate other heterogeneous information in a unified way.To address these problems,we propose a meta-path-based deep representation learning(MPDRL)model for personalized POI recommendation.In this model,we design eight types of meta-paths to fully utilize the rich heterogeneous information in LBSNs for the representations of users and POIs,and deeply mine the correlations between users and POIs.To further improve the recommendation performance,we design an attention-based long short-term memory(LSTM)network to learn the importance of different influential factors on a user's specific check-in activity.To verify the effectiveness of our proposed method,we conduct extensive experiments on a real-world dataset,Foursquare.Experimental results show that the MPDRL model improves at least 16.97%and 23.55%over all comparison methods in terms of the metric Precision@N(Pre@N)and Recall@N(Rec@N)respectively.
基金The work was supported by the National Key Research and Development Program of China under Grant No.2018AAA0102301the National Natural Science Foundation of China under Grant No.61925203.
文摘Scene-based recommendation has proven its usefulness in E-commerce,by recommending commodities based on a given scene.However,scenes are typically unknown in advance,which necessitates scene discovery for E-commerce.In this article,we study scene discovery for E-commerce systems.We first formalize a scene as a set of commodity cate-gories that occur simultaneously and frequently in real-world situations,and model an E-commerce platform as a heteroge-neous information network(HIN),whose nodes and links represent different types of objects and different types of rela-tionships between objects,respectively.We then formulate the scene mining problem for E-commerce as an unsupervised learning problem that finds the overlapping clusters of commodity categories in the HIN.To solve the problem,we pro-pose a non-negative matrix factorization based method SMEC(Scene Mining for E-Commerce),and theoretically prove its convergence.Using six real-world E-commerce datasets,we finally conduct an extensive experimental study to evaluate SMEC against 13 other methods,and show that SMEC consistently outperforms its competitors with regard to various evaluation measures.