大数据时代背景下诸如论文之间的引用网络、万维网、微博用户之间的网络比比皆是,通常建模为链接网络。利用这些网络链接和节点属性实现语义社区发现有助于了解网络的语义信息和中观结构。现有的语义社区发现方法可利用链接和数学实现...大数据时代背景下诸如论文之间的引用网络、万维网、微博用户之间的网络比比皆是,通常建模为链接网络。利用这些网络链接和节点属性实现语义社区发现有助于了解网络的语义信息和中观结构。现有的语义社区发现方法可利用链接和数学实现语义社区发现,但没有结合基于低维嵌入空间的表示;现有考虑低维嵌入的社区发现方法可更准确挖掘社区结构,但没有利用文档的内容属性。这些方法都不能充分利用链接网络的细粒度结构和语义信息,提出一种融合网络节点表示学习的属性网络的语义社区发现模型Rcolc(Representation learning and Community discovery on links and contents)。该模型可以融合文档的链接和属性信息实现语义社区发现,并考虑文档的基于链接的低维嵌入提升社区发现准确性。在真实属性网络上的实验表明该算法优于主流算法。展开更多
Objective To construct symptom-formula-herb heterogeneous graphs structured Treatise on Febrile Diseases(Shang Han Lun,《伤寒论》)dataset and explore an optimal learning method represented with node attributes based o...Objective To construct symptom-formula-herb heterogeneous graphs structured Treatise on Febrile Diseases(Shang Han Lun,《伤寒论》)dataset and explore an optimal learning method represented with node attributes based on graph convolutional network(GCN).Methods Clauses that contain symptoms,formulas,and herbs were abstracted from Treatise on Febrile Diseases to construct symptom-formula-herb heterogeneous graphs,which were used to propose a node representation learning method based on GCN−the Traditional Chinese Medicine Graph Convolution Network(TCM-GCN).The symptom-formula,symptom-herb,and formula-herb heterogeneous graphs were processed with the TCM-GCN to realize high-order propagating message passing and neighbor aggregation to obtain new node representation attributes,and thus acquiring the nodes’sum-aggregations of symptoms,formulas,and herbs to lay a foundation for the downstream tasks of the prediction models.Results Comparisons among the node representations with multi-hot encoding,non-fusion encoding,and fusion encoding showed that the Precision@10,Recall@10,and F1-score@10 of the fusion encoding were 9.77%,6.65%,and 8.30%,respectively,higher than those of the non-fusion encoding in the prediction studies of the model.Conclusion Node representations by fusion encoding achieved comparatively ideal results,indicating the TCM-GCN is effective in realizing node-level representations of heterogeneous graph structured Treatise on Febrile Diseases dataset and is able to elevate the performance of the downstream tasks of the diagnosis model.展开更多
文摘大数据时代背景下诸如论文之间的引用网络、万维网、微博用户之间的网络比比皆是,通常建模为链接网络。利用这些网络链接和节点属性实现语义社区发现有助于了解网络的语义信息和中观结构。现有的语义社区发现方法可利用链接和数学实现语义社区发现,但没有结合基于低维嵌入空间的表示;现有考虑低维嵌入的社区发现方法可更准确挖掘社区结构,但没有利用文档的内容属性。这些方法都不能充分利用链接网络的细粒度结构和语义信息,提出一种融合网络节点表示学习的属性网络的语义社区发现模型Rcolc(Representation learning and Community discovery on links and contents)。该模型可以融合文档的链接和属性信息实现语义社区发现,并考虑文档的基于链接的低维嵌入提升社区发现准确性。在真实属性网络上的实验表明该算法优于主流算法。
基金New-Generation Artificial Intelligence-Major Program in the Sci-Tech Innovation 2030 Agenda from the Ministry of Science and Technology of China(2018AAA0102100)Hunan Provincial Department of Education key project(21A0250)The First Class Discipline Open Fund of Hunan University of Traditional Chinese Medicine(2022ZYX08)。
文摘Objective To construct symptom-formula-herb heterogeneous graphs structured Treatise on Febrile Diseases(Shang Han Lun,《伤寒论》)dataset and explore an optimal learning method represented with node attributes based on graph convolutional network(GCN).Methods Clauses that contain symptoms,formulas,and herbs were abstracted from Treatise on Febrile Diseases to construct symptom-formula-herb heterogeneous graphs,which were used to propose a node representation learning method based on GCN−the Traditional Chinese Medicine Graph Convolution Network(TCM-GCN).The symptom-formula,symptom-herb,and formula-herb heterogeneous graphs were processed with the TCM-GCN to realize high-order propagating message passing and neighbor aggregation to obtain new node representation attributes,and thus acquiring the nodes’sum-aggregations of symptoms,formulas,and herbs to lay a foundation for the downstream tasks of the prediction models.Results Comparisons among the node representations with multi-hot encoding,non-fusion encoding,and fusion encoding showed that the Precision@10,Recall@10,and F1-score@10 of the fusion encoding were 9.77%,6.65%,and 8.30%,respectively,higher than those of the non-fusion encoding in the prediction studies of the model.Conclusion Node representations by fusion encoding achieved comparatively ideal results,indicating the TCM-GCN is effective in realizing node-level representations of heterogeneous graph structured Treatise on Febrile Diseases dataset and is able to elevate the performance of the downstream tasks of the diagnosis model.