摘要
由于短文本长度较短,在分类时会面临数据稀疏和语义模糊等问题。提出新型图卷积网络BTM_GCN,该网络利用双项主题模型(Biterm Topic Model,BTM)在短文本数据集上训练出固定数量的文档级潜在主题,并作为一种节点嵌入到文本异构图中,再与异构图中的文档节点进行连接,最后利用图卷积网络来捕获文档、词与主题节点之间的高阶邻域信息,从而丰富文档节点的语义信息,缓解短文本语义模糊的问题。在三个英文短文本数据集上的实验结果表明,该方法相比基准模型具有较优的分类效果。
Due to the short length of short text,there are problems such as data sparseness and semantic blurring in short text classification.This paper proposes a new graph convolutional network BTM_GCN,which uses the Biterm Topic Model(BTM)to train a fixed number of document-level potential topics on a short text dataset,and embeds it as a node in a text heterogeneous graph.Then in a heterogeneous graph,the document nodes are connected.Finally,the graph convolution network is used to capture the high-order neighborhood information between documents,words and topic nodes,thereby enriching the semantic information of the document nodes and alleviating the problem of short text semantic ambiguity.The experimental results on three English short text datasets show that the proposed method has better classification effect than the benchmark model.
作者
郑诚
董春阳
黄夏炎
ZHENG Cheng;DONG Chunyang;HUANG Xiayan(School of Computer Science and Technology,Anhui University,Hefei 230601,China;Key Laboratory of Intelligent Computing and Signal Processing,Ministry of Education,Hefei 230601,China)
出处
《计算机工程与应用》
CSCD
北大核心
2021年第4期155-160,共6页
Computer Engineering and Applications
关键词
短文本分类
图卷积网络
BTM主题模型
short text classification
graph convolutional network
BTM topic model