A heterogeneous information network,which is composed of various types of nodes and edges,has a complex structure and rich information content,and is widely used in social networks,academic networks,e-commerce,and oth...A heterogeneous information network,which is composed of various types of nodes and edges,has a complex structure and rich information content,and is widely used in social networks,academic networks,e-commerce,and other fields.Link prediction,as a key task to reveal the unobserved relationships in the network,is of great significance in heterogeneous information networks.This paper reviews the application of presentation-based learning methods in link prediction of heterogeneous information networks.This paper introduces the basic concepts of heterogeneous information networks,and the theoretical basis of representation learning,and discusses the specific application of the deep learning model in node embedding learning and link prediction in detail.The effectiveness and superiority of these methods on multiple real data sets are demonstrated by experimental verification.展开更多
Real-world complex networks are inherently heterogeneous;they have different types of nodes,attributes,and relationships.In recent years,various methods have been proposed to automatically learn how to encode the stru...Real-world complex networks are inherently heterogeneous;they have different types of nodes,attributes,and relationships.In recent years,various methods have been proposed to automatically learn how to encode the structural and semantic information contained in heterogeneous information networks(HINs)into low-dimensional embeddings;this task is called heterogeneous network embedding(HNE).Efficient HNE techniques can benefit various HIN-based machine learning tasks such as node classification,recommender systems,and information retrieval.Here,we provide a comprehensive survey of key advancements in the area of HNE.First,we define an encoder-decoder-based HNE model taxonomy.Then,we systematically overview,compare,and summarize various state-of-the-art HNE models and analyze the advantages and disadvantages of various model categories to identify more potentially competitive HNE frameworks.We also summarize the application fields,benchmark datasets,open source tools,andperformance evaluation in theHNEarea.Finally,wediscuss open issues and suggest promising future directions.We anticipate that this survey will provide deep insights into research in the field of HNE.展开更多
Graph convolutional networks(GCNs)have been developed as a general and powerful tool to handle various tasks related to graph data.However,current methods mainly consider homogeneous networks and ignore the rich seman...Graph convolutional networks(GCNs)have been developed as a general and powerful tool to handle various tasks related to graph data.However,current methods mainly consider homogeneous networks and ignore the rich semantics and multiple types of objects that are common in heterogeneous information networks(HINs).In this paper,we present a Heterogeneous Hyperedge Convolutional Network(HHCN),a novel graph convolutional network architecture that operates on HINs.Specifically,we extract the rich semantics by different metastructures and adopt hyperedge to model the interactions among metastructure-based neighbors.Due to the powerful information extraction capabilities of metastructure and hyperedge,HHCN has the flexibility to model the complex relationships in HINs by setting different combinations of metastructures and hyperedges.Moreover,a metastructure attention layer is also designed to allow each node to select the metastructures based on their importance and provide potential interpretability for graph analysis.As a result,HHCN can encode node features,metastructure-based semantics and hyperedge information simultaneously by aggregating features from metastructure-based neighbors in a hierarchical manner.We evaluate HHCN by applying it to the semi-supervised node classification task.Experimental results show that HHCN outperforms state-of-the-art graph embedding models and recently proposed graph convolutional network models.展开更多
As a promising technology to improve spectrum efficiency and transmission coverage,Heterogeneous Network(HetNet)has attracted the attention of many scholars in recent years.Additionally,with the introduction of the No...As a promising technology to improve spectrum efficiency and transmission coverage,Heterogeneous Network(HetNet)has attracted the attention of many scholars in recent years.Additionally,with the introduction of the Non-Orthogonal Multiple Access(NOMA)technology,the NOMA-assisted HetNet cannot only improve the system capacity but also allow more users to utilize the same frequency band resource,which makes the NOMA-assisted HetNet a hot topic.However,traditional resource allocation schemes assume that base stations can exactly estimate direct link gains and cross-tier link gains,which is impractical for practical HetNets due to the impact of channel delays and random perturbation.To further improve energy utilization and system robustness,in this paper,we investigate a robust resource allocation problem to maximize the total Energy Efficiency(EE)of Small-Cell Users(SCUs)in NOMA-assisted HetNets under imperfect channel state information.By considering bounded channel uncertainties,the robust resource optimization problem is formulated as a mixed-integer and nonlinear programming problem under the constraints of the cross-tier interference power of macrocell users,the maximum transmit power of small base station,the Resource Block(RB)assignment,and the quality of service requirement of each SCU.The original problem is converted into an equivalent convex optimization problem by using Dinkelbach's method and the successive convex approximation method.A robust Dinkelbach-based iteration algorithm is designed by jointly optimizing the transmit power and the RB allocation.Simulation results verify that the proposed algorithm has better EE and robustness than the existing algorithms.展开更多
Heterogeneous information network(HIN)has recently been widely adopted to describe complex graph structure in recommendation systems,proving its effectiveness in modeling complex graph data.Although existing HIN-based...Heterogeneous information network(HIN)has recently been widely adopted to describe complex graph structure in recommendation systems,proving its effectiveness in modeling complex graph data.Although existing HIN-based recommendation studies have achieved great success by performing message propagation between connected nodes on the defined metapaths,they have the following major limitations.Existing works mainly convert heterogeneous graphs into homogeneous graphs via defining metapaths,which are not expressive enough to capture more complicated dependency relationships involved on the metapath.Besides,the heterogeneous information is more likely to be provided by item attributes while social relations between users are not adequately considered.To tackle these limitations,we propose a novel social recommendation model MPISR,which models MetaPath Interaction for Social Recommendation on heterogeneous information network.Specifically,our model first learns the initial node representation through a pretraining module,and then identifies potential social friends and item relations based on their similarity to construct a unified HIN.We then develop the two-way encoder module with similarity encoder and instance encoder to capture the similarity collaborative signals and relational dependency on different metapaths.Extensive experiments on five real datasets demonstrate the effectiveness of our method.展开更多
Heterogeneous information networks,which consist of multi-typed vertices representing objects and multi-typed edges representing relations between objects,are ubiquitous in the real world.In this paper,we study the pr...Heterogeneous information networks,which consist of multi-typed vertices representing objects and multi-typed edges representing relations between objects,are ubiquitous in the real world.In this paper,we study the problem of entity matching for heterogeneous information networks based on distributed network embedding and multi-layer perceptron with a highway network,and we propose a new method named DEM short for Deep Entity Matching.In contrast to the traditional entity matching methods,DEM utilizes the multi-layer perceptron with a highway network to explore the hidden relations to improve the performance of matching.Importantly,we incorporate DEM with the network embedding methodology,enabling highly efficient computing in a vectorized manner.DEM's generic modeling of both the network structure and the entity attributes enables it to model various heterogeneous information networks flexibly.To illustrate its functionality,we apply the DEM algorithm to two real-world entity matching applications:user linkage under the social network analysis scenario that predicts the same or matched users in different social platforms and record linkage that predicts the same or matched records in different citation networks.Extensive experiments on real-world datasets demonstrate DEM's effectiveness and rationality.展开更多
Information networks that can be extracted from many domains are widely studied recently. Different functions for mining these networks are proposed and developed, such as ranking, community detection, and link predic...Information networks that can be extracted from many domains are widely studied recently. Different functions for mining these networks are proposed and developed, such as ranking, community detection, and link prediction. Most existing network studies are on homogeneous networks, where nodes and links are assumed from one single type. In reality, however, heterogeneous information networks can better model the real-world systems, which are typically semi-structured and typed, following a network schema. In order to mine these heterogeneous information networks directly, we propose to explore the meta structure of the information network, i.e., the network schema. The concepts of meta-paths are proposed to systematically capture numerous semantic relationships across multiple types of objects, which are defined as a path over the graph of network schema. Meta-paths can provide guidance for search and mining of the network and help analyze and understand the semantic meaning of the objects and relations in the network. Under this framework, similarity search and other mining tasks such as relationship prediction and clustering can be addressed by systematic exploration of the network meta structure. Moreover, with user's guidance or feedback, we can select the best meta-path or their weighted combination for a specific mining task.展开更多
Traditional cheaptalk game model with homogeneous information sources provided a con clusion that dishonest information sources will not be identified if he changes strategy stochastically. In this paper, the authors ...Traditional cheaptalk game model with homogeneous information sources provided a con clusion that dishonest information sources will not be identified if he changes strategy stochastically. In this paper, the authors incorporate different information diffusion networks and heterogeneous in formation sources into an agentbased artificial stock market. The obtained results are different with traditional results that identification ability of uninformed agents has been highly improved with diffu sion networks and heterogeneous information sources. Additionally, the authors find uninformed agents can improve identification ability only if there exists a sufficient number of heterogeneous information sources in stock market.展开更多
Mining outliers in heterogeneous networks is crucial to many applications,but challenges abound.In this paper,we focus on identifying meta-path-based outliers in heterogeneous information network(HIN),and calculate th...Mining outliers in heterogeneous networks is crucial to many applications,but challenges abound.In this paper,we focus on identifying meta-path-based outliers in heterogeneous information network(HIN),and calculate the similarity between different types of objects.We propose a meta-path-based outlier detection method(MPOutliers)in heterogeneous information network to deal with problems in one go under a unified framework.MPOutliers calculates the heterogeneous reachable probability by combining different types of objects and their relationships.It discovers the semantic information among nodes in heterogeneous networks,instead of only considering the network structure.It also computes the closeness degree between nodes with the same type,which extends the whole heterogeneous network.Moreover,each node is assigned with a reliable weighting to measure its authority degree.Substantial experiments on two real datasets(AMiner and Movies dataset)show that our proposed method is very effective and efficient for outlier detection.展开更多
Entity set expansion(ESE)aims to expand an entity seed set to obtain more entities which have common properties.ESE is important for many applications such as dictionary con-struction and query suggestion.Traditional ...Entity set expansion(ESE)aims to expand an entity seed set to obtain more entities which have common properties.ESE is important for many applications such as dictionary con-struction and query suggestion.Traditional ESE methods relied heavily on the text and Web information of entities.Recently,some ESE methods employed knowledge graphs(KGs)to extend entities.However,they failed to effectively and fficiently utilize the rich semantics contained in a KG and ignored the text information of entities in Wikipedia.In this paper,we model a KG as a heterogeneous information network(HIN)containing multiple types of objects and relations.Fine-grained multi-type meta paths are proposed to capture the hidden relation among seed entities in a KG and thus to retrieve candidate entities.Then we rank the entities according to the meta path based structural similarity.Furthermore,to utilize the text description of entities in Wikipedia,we propose an extended model CoMeSE++which combines both structural information revealed by a KG and text information in Wikipedia for ESE.Extensive experiments on real-world datasets demonstrate that our model achieves better performance by combining structural and textual information of entities.展开更多
Community discovery is an important task in social network analysis.However,most existing methods for community discovery rely on the topological structure alone.These methods ignore the rich information available in ...Community discovery is an important task in social network analysis.However,most existing methods for community discovery rely on the topological structure alone.These methods ignore the rich information available in the content data.In order to solve this issue,in this paper,we present a community discovery method based on heterogeneous information network decomposition and embedding.Unlike traditional methods,our method takes into account topology,node content and edge content,which can supply abundant evidence for community discovery.First,an embedding-based similarity evaluation method is proposed,which decomposes the heterogeneous information network into several subnetworks,and extracts their potential deep representation to evaluate the similarities between nodes.Second,a bottom-up community discovery algorithm is proposed.Via leader nodes selection,initial community generation,and community expansion,communities can be found more efficiently.Third,some incremental maintenance strategies for the changes of networks are proposed.We conduct experimental studies based on three real-world social networks.Experiments demonstrate the effectiveness and the efficiency of our proposed method.Compared with the traditional methods,our method improves normalized mutual information(NMI)and the modularity by an average of 12%and 37%respectively.展开更多
Heterogeneous Information Networks(HINs)contain multiple types of nodes and edges;therefore,they can preserve the semantic information and structure information.Cluster analysis using an HIN has obvious advantages ove...Heterogeneous Information Networks(HINs)contain multiple types of nodes and edges;therefore,they can preserve the semantic information and structure information.Cluster analysis using an HIN has obvious advantages over a transformation into a homogenous information network,which can promote the clustering results of different types of nodes.In our study,we applied a Nonnegative Matrix Tri-Factorization(NMTF)in a cluster analysis of multiple metapaths in HIN.Unlike the parameter estimation method of the probability distribution in previous studies,NMTF can obtain several dependent latent variables simultaneously,and each latent variable in NMTF is associated with the cluster of the corresponding node in the HIN.The method is suited to co-clustering leveraging multiple metapaths in HIN,because NMTF is employed for multiple nonnegative matrix factorizations simultaneously in our study.Experimental results on the real dataset show that the validity and correctness of our method,and the clustering result are better than that of the existing similar clustering algorithm.展开更多
Heterogeneous information networks(HINs)have been extensively applied to real-world tasks,such as recommendation systems,social networks,and citation networks.While existing HIN representation learning methods can eff...Heterogeneous information networks(HINs)have been extensively applied to real-world tasks,such as recommendation systems,social networks,and citation networks.While existing HIN representation learning methods can effectively learn the semantic and structural features in the network,little awareness was given to the distribution discrepancy of subgraphs within a single HIN.However,we find that ignoring such distribution discrepancy among subgraphs from multiple sources would hinder the effectiveness of graph embedding learning algorithms.This motivates us to propose SUMSHINE(Scalable Unsupervised Multi-Source Heterogeneous Information Network Embedding)-a scalable unsupervised framework to align the embedding distributions among multiple sources of an HiN.Experimental results on real-world datasets in a variety of downstream tasks validate the performance of our method over the state-of-the-art heterogeneous information network embedding algorithms.展开更多
由于异构信息网络HIN(heterogeneous information network)具有丰富的语义信息而在推荐任务中得到广泛应用.传统的面向异构信息网络的推荐方法忽略了网络中关联关系的异质性,以及不同关联类型之间的相互影响.提出了一种基于多视角嵌入...由于异构信息网络HIN(heterogeneous information network)具有丰富的语义信息而在推荐任务中得到广泛应用.传统的面向异构信息网络的推荐方法忽略了网络中关联关系的异质性,以及不同关联类型之间的相互影响.提出了一种基于多视角嵌入融合的推荐模型,分别从同质关联视角和异质关联视角来挖掘异构信息网络的深层潜在特征并加以融合,有效地保证了推荐结果的准确性.针对同质关联视角,提出了一种基于图卷积神经网络的嵌入融合方法,通过对同质关联作用下节点邻域信息的轻量式卷积,实现节点嵌入的局部融合.针对异质关联视角,提出了一种基于注意力的嵌入融合方法,利用注意力机制来区分不同关联类型对节点嵌入的影响,实现节点嵌入的全局融合.通过实验验证了所提出的关键技术的可行性和有效性.展开更多
The information on host-microbe interactions contained in the operational taxonomic unit(OTU)abundance table can serve as a clue to understanding the biological traits of OTUs and samples.Some studies have inferred th...The information on host-microbe interactions contained in the operational taxonomic unit(OTU)abundance table can serve as a clue to understanding the biological traits of OTUs and samples.Some studies have inferred the taxonomies or functions of OTUs by constructing co-occurrence networks,but co-occurrence networks can only encompass a small fraction of all OTUs due to the high sparsity of the OTU table.There is a lack of studies that intensively explore and use the information on sample-OTU interactions.This study constructed a sample-OTU heterogeneous information network and represented the nodes in the network through the heterogeneous graph embedding method to form the OTU space and sample space.Taking advantage of the represented OTU and sample vectors combined with the original OTU abundance information,an Integrated Model of Embedded Taxonomies and Abundance(IMETA)was proposed for predicting sample attributes,such as phenotypes and individual diet habits.Both the OTU space and sample space contain reasonable biological or medical semantic information,and the IMETA using embedded OTU and sample vectors can have stable and good performance in the sample classification tasks.This suggests that the embedding representation based on the sample-OTU heterogeneous information network can provide more useful information for understanding microbiome samples.This study conducted quantified representations of the biological characteristics within the OTUs and samples,which is a good attempt to increase the utilization rate of information in the OTU abundance table,and it promotes a deeper understanding of the underlying knowledge of human microbiome.展开更多
针对现有模型对异质信息网络(heterogeneous information network, HIN)信息提取大部分依赖于元路径,缺乏元路径信息补充以及很少学习异质图中复杂的结构信息等问题,提出一种异质网中基于邻居节点和元路径的推荐算法(NMRec)。提取用户...针对现有模型对异质信息网络(heterogeneous information network, HIN)信息提取大部分依赖于元路径,缺乏元路径信息补充以及很少学习异质图中复杂的结构信息等问题,提出一种异质网中基于邻居节点和元路径的推荐算法(NMRec)。提取用户和物品邻居节点补充元路径缺失的信息,以卷积的方式捕获节点之间丰富的交互,通过注意力机制得到节点和元路径的嵌入表示,拼接用户、物品、邻居节点及元路径进行TOP-N推荐。在两个公开数据集上的实验结果表明,NMRec推荐性能良好,对推荐结果有良好的可解释性,与7种推荐基准算法相比,NMRec在评价指标Pre@10、Recall@10、NDGG@10上至少提升了0.21%、29%、1.46%。展开更多
基金Science and Technology Research Project of Jiangxi Provincial Department of Education(Project No.GJJ211348,GJJ211347 and GJJ2201056)。
文摘A heterogeneous information network,which is composed of various types of nodes and edges,has a complex structure and rich information content,and is widely used in social networks,academic networks,e-commerce,and other fields.Link prediction,as a key task to reveal the unobserved relationships in the network,is of great significance in heterogeneous information networks.This paper reviews the application of presentation-based learning methods in link prediction of heterogeneous information networks.This paper introduces the basic concepts of heterogeneous information networks,and the theoretical basis of representation learning,and discusses the specific application of the deep learning model in node embedding learning and link prediction in detail.The effectiveness and superiority of these methods on multiple real data sets are demonstrated by experimental verification.
基金supported by the National Key Research and Development Plan of China(2017YFB0503700,2016YFB0501801)the National Natural Science Foundation of China(61170026,62173157)+1 种基金the Thirteen Five-Year Research Planning Project of National Language Committee(No.YB135-149)the Fundamental Research Funds for the Central Universities(Nos.CCNU20QN022,CCNU20QN021,CCNU20ZT012).
文摘Real-world complex networks are inherently heterogeneous;they have different types of nodes,attributes,and relationships.In recent years,various methods have been proposed to automatically learn how to encode the structural and semantic information contained in heterogeneous information networks(HINs)into low-dimensional embeddings;this task is called heterogeneous network embedding(HNE).Efficient HNE techniques can benefit various HIN-based machine learning tasks such as node classification,recommender systems,and information retrieval.Here,we provide a comprehensive survey of key advancements in the area of HNE.First,we define an encoder-decoder-based HNE model taxonomy.Then,we systematically overview,compare,and summarize various state-of-the-art HNE models and analyze the advantages and disadvantages of various model categories to identify more potentially competitive HNE frameworks.We also summarize the application fields,benchmark datasets,open source tools,andperformance evaluation in theHNEarea.Finally,wediscuss open issues and suggest promising future directions.We anticipate that this survey will provide deep insights into research in the field of HNE.
基金funded by The Science and Technology Strengthening Police Basic Program of Ministry of Public Security(2018GABJC03)The Technology Research Project Program of Ministry of Public Security(2018JSYJA02).
文摘Graph convolutional networks(GCNs)have been developed as a general and powerful tool to handle various tasks related to graph data.However,current methods mainly consider homogeneous networks and ignore the rich semantics and multiple types of objects that are common in heterogeneous information networks(HINs).In this paper,we present a Heterogeneous Hyperedge Convolutional Network(HHCN),a novel graph convolutional network architecture that operates on HINs.Specifically,we extract the rich semantics by different metastructures and adopt hyperedge to model the interactions among metastructure-based neighbors.Due to the powerful information extraction capabilities of metastructure and hyperedge,HHCN has the flexibility to model the complex relationships in HINs by setting different combinations of metastructures and hyperedges.Moreover,a metastructure attention layer is also designed to allow each node to select the metastructures based on their importance and provide potential interpretability for graph analysis.As a result,HHCN can encode node features,metastructure-based semantics and hyperedge information simultaneously by aggregating features from metastructure-based neighbors in a hierarchical manner.We evaluate HHCN by applying it to the semi-supervised node classification task.Experimental results show that HHCN outperforms state-of-the-art graph embedding models and recently proposed graph convolutional network models.
基金This work was supported by the National Natural Science Foundation of China(No.61601071,62071078)the National Key Research and Development Program of China(No.2019YFC1511300)+2 种基金the Natural Science Foundation of Chongqing(No.cstc2019jcyj-xfkxX0002)the Chongqing Entrepreneurship and Innovation Program for the Returned Overseas Chinese Scholars(No.cx2020095)the Graduate Scientific Research Innovation Project of Chongqing(No.CYS20251,CYS20253).
文摘As a promising technology to improve spectrum efficiency and transmission coverage,Heterogeneous Network(HetNet)has attracted the attention of many scholars in recent years.Additionally,with the introduction of the Non-Orthogonal Multiple Access(NOMA)technology,the NOMA-assisted HetNet cannot only improve the system capacity but also allow more users to utilize the same frequency band resource,which makes the NOMA-assisted HetNet a hot topic.However,traditional resource allocation schemes assume that base stations can exactly estimate direct link gains and cross-tier link gains,which is impractical for practical HetNets due to the impact of channel delays and random perturbation.To further improve energy utilization and system robustness,in this paper,we investigate a robust resource allocation problem to maximize the total Energy Efficiency(EE)of Small-Cell Users(SCUs)in NOMA-assisted HetNets under imperfect channel state information.By considering bounded channel uncertainties,the robust resource optimization problem is formulated as a mixed-integer and nonlinear programming problem under the constraints of the cross-tier interference power of macrocell users,the maximum transmit power of small base station,the Resource Block(RB)assignment,and the quality of service requirement of each SCU.The original problem is converted into an equivalent convex optimization problem by using Dinkelbach's method and the successive convex approximation method.A robust Dinkelbach-based iteration algorithm is designed by jointly optimizing the transmit power and the RB allocation.Simulation results verify that the proposed algorithm has better EE and robustness than the existing algorithms.
基金supported by the National Natural Science Foundation of China(Grant Nos.61762078,62276073,61966009 and U22A2099)the Industrial Support Project of Gansu Colleges(No.2022CYZC11)+3 种基金the Natural Science Foundation of Gansu Province(21JR7RA114)the Northwest Normal University Young Teachers Research Capacity Promotion Plan(NWNU-LKQN2019-2)the Industrial Support Project of Gansu Colleges(No.2022CYZC11)the Northwest Normal University Post-graduate Research Funding Project(2021KYZZ02107).
文摘Heterogeneous information network(HIN)has recently been widely adopted to describe complex graph structure in recommendation systems,proving its effectiveness in modeling complex graph data.Although existing HIN-based recommendation studies have achieved great success by performing message propagation between connected nodes on the defined metapaths,they have the following major limitations.Existing works mainly convert heterogeneous graphs into homogeneous graphs via defining metapaths,which are not expressive enough to capture more complicated dependency relationships involved on the metapath.Besides,the heterogeneous information is more likely to be provided by item attributes while social relations between users are not adequately considered.To tackle these limitations,we propose a novel social recommendation model MPISR,which models MetaPath Interaction for Social Recommendation on heterogeneous information network.Specifically,our model first learns the initial node representation through a pretraining module,and then identifies potential social friends and item relations based on their similarity to construct a unified HIN.We then develop the two-way encoder module with similarity encoder and instance encoder to capture the similarity collaborative signals and relational dependency on different metapaths.Extensive experiments on five real datasets demonstrate the effectiveness of our method.
基金supported by the National Natural Science Foundation of China Youth Fund under Grant No.61902001.
文摘Heterogeneous information networks,which consist of multi-typed vertices representing objects and multi-typed edges representing relations between objects,are ubiquitous in the real world.In this paper,we study the problem of entity matching for heterogeneous information networks based on distributed network embedding and multi-layer perceptron with a highway network,and we propose a new method named DEM short for Deep Entity Matching.In contrast to the traditional entity matching methods,DEM utilizes the multi-layer perceptron with a highway network to explore the hidden relations to improve the performance of matching.Importantly,we incorporate DEM with the network embedding methodology,enabling highly efficient computing in a vectorized manner.DEM's generic modeling of both the network structure and the entity attributes enables it to model various heterogeneous information networks flexibly.To illustrate its functionality,we apply the DEM algorithm to two real-world entity matching applications:user linkage under the social network analysis scenario that predicts the same or matched users in different social platforms and record linkage that predicts the same or matched records in different citation networks.Extensive experiments on real-world datasets demonstrate DEM's effectiveness and rationality.
基金supported in part by the U.S.Army Research Laboratory under Cooperative Agreement No.W911NF-09-2-0053(NS-CTA),NSF ⅡS-0905215,CNS-09-31975MIAS,a DHS-IDS Center for Multimodal Information Access and Synthesis at UIUC
文摘Information networks that can be extracted from many domains are widely studied recently. Different functions for mining these networks are proposed and developed, such as ranking, community detection, and link prediction. Most existing network studies are on homogeneous networks, where nodes and links are assumed from one single type. In reality, however, heterogeneous information networks can better model the real-world systems, which are typically semi-structured and typed, following a network schema. In order to mine these heterogeneous information networks directly, we propose to explore the meta structure of the information network, i.e., the network schema. The concepts of meta-paths are proposed to systematically capture numerous semantic relationships across multiple types of objects, which are defined as a path over the graph of network schema. Meta-paths can provide guidance for search and mining of the network and help analyze and understand the semantic meaning of the objects and relations in the network. Under this framework, similarity search and other mining tasks such as relationship prediction and clustering can be addressed by systematic exploration of the network meta structure. Moreover, with user's guidance or feedback, we can select the best meta-path or their weighted combination for a specific mining task.
基金supported by the National Natural Science Foundation of China under Grant Nos.71131007,71271144,and 71271145the New Century Excellent Talents Supporting Program by Ministry of Education under Grant No.NECT-10-0626the Innovative Research Team in University Supporting Program by Ministry of Education under Grant No.IRT 1208
文摘Traditional cheaptalk game model with homogeneous information sources provided a con clusion that dishonest information sources will not be identified if he changes strategy stochastically. In this paper, the authors incorporate different information diffusion networks and heterogeneous in formation sources into an agentbased artificial stock market. The obtained results are different with traditional results that identification ability of uninformed agents has been highly improved with diffu sion networks and heterogeneous information sources. Additionally, the authors find uninformed agents can improve identification ability only if there exists a sufficient number of heterogeneous information sources in stock market.
基金the National Natural Science Foundation of China(Grant Nos.61872163 and 61806084)China Postdoctoral Science Foundation project(2018M631872)Jilin Provincial Education Department project(JJKH20190160KJ).
文摘Mining outliers in heterogeneous networks is crucial to many applications,but challenges abound.In this paper,we focus on identifying meta-path-based outliers in heterogeneous information network(HIN),and calculate the similarity between different types of objects.We propose a meta-path-based outlier detection method(MPOutliers)in heterogeneous information network to deal with problems in one go under a unified framework.MPOutliers calculates the heterogeneous reachable probability by combining different types of objects and their relationships.It discovers the semantic information among nodes in heterogeneous networks,instead of only considering the network structure.It also computes the closeness degree between nodes with the same type,which extends the whole heterogeneous network.Moreover,each node is assigned with a reliable weighting to measure its authority degree.Substantial experiments on two real datasets(AMiner and Movies dataset)show that our proposed method is very effective and efficient for outlier detection.
基金This work was supported by the National Natural Science Foundation of China(Grant Nos.61806020,61772082,61972047,61702296)the National Key Research and Development Program of China(2017YFB0803304)+1 种基金the Beijing Municipal Natural Science Foundation(4182043)the CCF-Tencent Open Fund,and the Fundamental Research Funds for the Central Universities.
文摘Entity set expansion(ESE)aims to expand an entity seed set to obtain more entities which have common properties.ESE is important for many applications such as dictionary con-struction and query suggestion.Traditional ESE methods relied heavily on the text and Web information of entities.Recently,some ESE methods employed knowledge graphs(KGs)to extend entities.However,they failed to effectively and fficiently utilize the rich semantics contained in a KG and ignored the text information of entities in Wikipedia.In this paper,we model a KG as a heterogeneous information network(HIN)containing multiple types of objects and relations.Fine-grained multi-type meta paths are proposed to capture the hidden relation among seed entities in a KG and thus to retrieve candidate entities.Then we rank the entities according to the meta path based structural similarity.Furthermore,to utilize the text description of entities in Wikipedia,we propose an extended model CoMeSE++which combines both structural information revealed by a KG and text information in Wikipedia for ESE.Extensive experiments on real-world datasets demonstrate that our model achieves better performance by combining structural and textual information of entities.
基金The work was supported by the National Key Research and Development Program of China under Grant No.2018YFB1003404the National Natural Science Foundation of China under Grant Nos.61672142,U1435216 and 61602103.
文摘Community discovery is an important task in social network analysis.However,most existing methods for community discovery rely on the topological structure alone.These methods ignore the rich information available in the content data.In order to solve this issue,in this paper,we present a community discovery method based on heterogeneous information network decomposition and embedding.Unlike traditional methods,our method takes into account topology,node content and edge content,which can supply abundant evidence for community discovery.First,an embedding-based similarity evaluation method is proposed,which decomposes the heterogeneous information network into several subnetworks,and extracts their potential deep representation to evaluate the similarities between nodes.Second,a bottom-up community discovery algorithm is proposed.Via leader nodes selection,initial community generation,and community expansion,communities can be found more efficiently.Third,some incremental maintenance strategies for the changes of networks are proposed.We conduct experimental studies based on three real-world social networks.Experiments demonstrate the effectiveness and the efficiency of our proposed method.Compared with the traditional methods,our method improves normalized mutual information(NMI)and the modularity by an average of 12%and 37%respectively.
基金supported in part by the National Natural Science Foundation of China(No.61701190)the Youth Science Foundation of Jilin Province of China(No.20180520021JH)+4 种基金the National Key Research and Development Plan of China(No.2017YFA0604500)the Key Scientific and Technological Research and Development Plan of Jilin Province of China(No.20180201103GX)the China Postdoctoral Science Foundation(No.2018M631873)the Project of Jilin Province Development and Reform Commission(No.2019FGWTZC001)the Key Technology Innovation Cooperation Project of Government and University for the Whole Industry Demonstration(No.SXGJSF2017-4)。
文摘Heterogeneous Information Networks(HINs)contain multiple types of nodes and edges;therefore,they can preserve the semantic information and structure information.Cluster analysis using an HIN has obvious advantages over a transformation into a homogenous information network,which can promote the clustering results of different types of nodes.In our study,we applied a Nonnegative Matrix Tri-Factorization(NMTF)in a cluster analysis of multiple metapaths in HIN.Unlike the parameter estimation method of the probability distribution in previous studies,NMTF can obtain several dependent latent variables simultaneously,and each latent variable in NMTF is associated with the cluster of the corresponding node in the HIN.The method is suited to co-clustering leveraging multiple metapaths in HIN,because NMTF is employed for multiple nonnegative matrix factorizations simultaneously in our study.Experimental results on the real dataset show that the validity and correctness of our method,and the clustering result are better than that of the existing similar clustering algorithm.
基金supported by the Research Grants Council of Hong Kong(17308321)the HKUTCL Joint Research Center for Artificial Intelligence sponsored by TCL Corporate Research(Hong Kong).
文摘Heterogeneous information networks(HINs)have been extensively applied to real-world tasks,such as recommendation systems,social networks,and citation networks.While existing HIN representation learning methods can effectively learn the semantic and structural features in the network,little awareness was given to the distribution discrepancy of subgraphs within a single HIN.However,we find that ignoring such distribution discrepancy among subgraphs from multiple sources would hinder the effectiveness of graph embedding learning algorithms.This motivates us to propose SUMSHINE(Scalable Unsupervised Multi-Source Heterogeneous Information Network Embedding)-a scalable unsupervised framework to align the embedding distributions among multiple sources of an HiN.Experimental results on real-world datasets in a variety of downstream tasks validate the performance of our method over the state-of-the-art heterogeneous information network embedding algorithms.
文摘由于异构信息网络HIN(heterogeneous information network)具有丰富的语义信息而在推荐任务中得到广泛应用.传统的面向异构信息网络的推荐方法忽略了网络中关联关系的异质性,以及不同关联类型之间的相互影响.提出了一种基于多视角嵌入融合的推荐模型,分别从同质关联视角和异质关联视角来挖掘异构信息网络的深层潜在特征并加以融合,有效地保证了推荐结果的准确性.针对同质关联视角,提出了一种基于图卷积神经网络的嵌入融合方法,通过对同质关联作用下节点邻域信息的轻量式卷积,实现节点嵌入的局部融合.针对异质关联视角,提出了一种基于注意力的嵌入融合方法,利用注意力机制来区分不同关联类型对节点嵌入的影响,实现节点嵌入的全局融合.通过实验验证了所提出的关键技术的可行性和有效性.
基金National Key R&DProgram,Grant/Award Number:2021YFF1200900National Natural Science Foundation of China,Grant/Award Number:72101029Beijing Natural Science Foundation Proposed Program,Grant/Award Number:4204104。
文摘The information on host-microbe interactions contained in the operational taxonomic unit(OTU)abundance table can serve as a clue to understanding the biological traits of OTUs and samples.Some studies have inferred the taxonomies or functions of OTUs by constructing co-occurrence networks,but co-occurrence networks can only encompass a small fraction of all OTUs due to the high sparsity of the OTU table.There is a lack of studies that intensively explore and use the information on sample-OTU interactions.This study constructed a sample-OTU heterogeneous information network and represented the nodes in the network through the heterogeneous graph embedding method to form the OTU space and sample space.Taking advantage of the represented OTU and sample vectors combined with the original OTU abundance information,an Integrated Model of Embedded Taxonomies and Abundance(IMETA)was proposed for predicting sample attributes,such as phenotypes and individual diet habits.Both the OTU space and sample space contain reasonable biological or medical semantic information,and the IMETA using embedded OTU and sample vectors can have stable and good performance in the sample classification tasks.This suggests that the embedding representation based on the sample-OTU heterogeneous information network can provide more useful information for understanding microbiome samples.This study conducted quantified representations of the biological characteristics within the OTUs and samples,which is a good attempt to increase the utilization rate of information in the OTU abundance table,and it promotes a deeper understanding of the underlying knowledge of human microbiome.
文摘针对现有模型对异质信息网络(heterogeneous information network, HIN)信息提取大部分依赖于元路径,缺乏元路径信息补充以及很少学习异质图中复杂的结构信息等问题,提出一种异质网中基于邻居节点和元路径的推荐算法(NMRec)。提取用户和物品邻居节点补充元路径缺失的信息,以卷积的方式捕获节点之间丰富的交互,通过注意力机制得到节点和元路径的嵌入表示,拼接用户、物品、邻居节点及元路径进行TOP-N推荐。在两个公开数据集上的实验结果表明,NMRec推荐性能良好,对推荐结果有良好的可解释性,与7种推荐基准算法相比,NMRec在评价指标Pre@10、Recall@10、NDGG@10上至少提升了0.21%、29%、1.46%。