摘要
提出一种新的网络表示学习算法DWLTI,它是可以同时考虑网络的结构信息和节点的文本属性信息的低维向量表示.DWLTI模型是一种基于deepwalk方法的能够适应有限文本信息的新模型.它通过采用合适的数据融合形式,同时最大化随机游走获得的节点序列和文本内容的词语序列的共现概率.通过应用两棵哈夫曼子树,使得即使只有少量部分节点拥有自身的文本信息,这些稀疏信息也能被充分利用.最后在真实网络数据集上进行节点分类实验,评估学习到的节点表示的质量.实验结果表明,利用有限文本信息的DWLTI优于多种经典基线模型.
A new network representation method was proposed. It could simultaneously consider both the network link structure information and the text information on some nodes in the optimization goal. It had created a correct format to merge those parts to maximize the co-occurrence probability of the nodes sequence gotten by random walk and the word sequence in text. The new model used two Huffman sub-trees to make all text information useful even with small amount of nodes. It used Hierarchical Softmax to optimize the model by building binary tree and learned model parameters using deep learning. Linear SVM was chosed to test the quality of vector representation in the new low-dimension embedding space. The experimental result showed that the new method DWLTI was useful in the network with limited text information on part nodes. The results of DWLTI were better than some other classical models in this field.
出处
《郑州大学学报(理学版)》
CAS
北大核心
2017年第1期29-33,共5页
Journal of Zhengzhou University:Natural Science Edition
基金
973项目(2014CB340400)
国家自然科学基金项目(U1536201
61272340)