航行通告是民用航空情报领域的重要情报资料,针对中文航行通告专业名词较多、格式不统一及语义复杂等问题,提出了一种基于BERT-Bi-LSTM-CRF的实体识别模型,对航行通告E项内容中事件要素实体进行抽取。首先通过BERT(bidirectional encode...航行通告是民用航空情报领域的重要情报资料,针对中文航行通告专业名词较多、格式不统一及语义复杂等问题,提出了一种基于BERT-Bi-LSTM-CRF的实体识别模型,对航行通告E项内容中事件要素实体进行抽取。首先通过BERT(bidirectional encoder representations from transforms)模型对处理后的向量进行预训练,捕捉丰富的语义特征,然后传送至双向长短期记忆网络(bidirectional long short-term memory,Bi-LSTM)模型对上下文特征进行提取,最后利用条件随机场(conditional random field,CRF)模型对最佳实体标签预测并输出。收集并整理机场类航行通告相关的原始语料,经过文本标注与数据预处理,形成了可用于实体识别实验的训练集、验证集和评价集数据。基于此数据与不同的实体识别模型进行对比实验,BERT-Bi-LSTM-CRF模型的准确率为89.68%、召回率为81.77%、F_(1)为85.54%,其中F 1相比现有模型得到有效提升,结果验证了该模型在机场类航行通告中要素实体识别的有效性。展开更多
Mining outliers in heterogeneous networks is crucial to many applications,but challenges abound.In this paper,we focus on identifying meta-path-based outliers in heterogeneous information network(HIN),and calculate th...Mining outliers in heterogeneous networks is crucial to many applications,but challenges abound.In this paper,we focus on identifying meta-path-based outliers in heterogeneous information network(HIN),and calculate the similarity between different types of objects.We propose a meta-path-based outlier detection method(MPOutliers)in heterogeneous information network to deal with problems in one go under a unified framework.MPOutliers calculates the heterogeneous reachable probability by combining different types of objects and their relationships.It discovers the semantic information among nodes in heterogeneous networks,instead of only considering the network structure.It also computes the closeness degree between nodes with the same type,which extends the whole heterogeneous network.Moreover,each node is assigned with a reliable weighting to measure its authority degree.Substantial experiments on two real datasets(AMiner and Movies dataset)show that our proposed method is very effective and efficient for outlier detection.展开更多
针对现有的基于异构图神经网络的短文本分类方法未充分利用节点之间的有效信息,以及存在的过拟合问题,文中提出基于门控双层异构图注意力网络的半监督短文本分类方法(Semi-Supervised Short Text Classification with Gated Double-Laye...针对现有的基于异构图神经网络的短文本分类方法未充分利用节点之间的有效信息,以及存在的过拟合问题,文中提出基于门控双层异构图注意力网络的半监督短文本分类方法(Semi-Supervised Short Text Classification with Gated Double-Layer Heterogeneous Graph Attention Network,GDHG).GDHG包含节点注意力机制和门控异构图注意力网络两层.首先,使用节点注意力机制,训练不同类型的节点注意力系数,再将系数输入门控异构图注意力网络,训练得到门控双层注意力.然后,将门控双层注意力与节点的不同状态相乘,得到聚合的节点特征.最后,使用softmax函数对文本进行分类.GDHG利用节点注意力机制和门控异构图注意力网络的信息遗忘机制对节点信息进行聚集,得到有效的相邻节点信息,进而挖掘不同邻居节点的隐藏信息,提高聚合远程节点信息的能力.在Twitter、MR、Snippets、AGNews四个短文本数据集上的实验验证GDHG性能较优.展开更多
针对中文短文本特征提取存在语义特征稀疏的问题,为了弥补图卷积网络不能捕捉长距离上下文关联性的不足,引入双向长短时记忆网络(Bi-directional Long Short-Term Memory,BiLSTM),提出BERT;GCN短文本分类模型。首先利用BERT对文本信息...针对中文短文本特征提取存在语义特征稀疏的问题,为了弥补图卷积网络不能捕捉长距离上下文关联性的不足,引入双向长短时记忆网络(Bi-directional Long Short-Term Memory,BiLSTM),提出BERT;GCN短文本分类模型。首先利用BERT对文本信息进行字符级编码作为图节点的特征值,其次通过全局共享的点互信息(Pointwise Mutual Information,PMI)关系作为节点间的边为每个文档构建一个单独的文本图,再次,聚合图卷积网络和BiLSTM的输出形成融合上下文信息的特征矩阵并输入到下一层的图卷积网络,最后输出到全连接层得到最终分类结果。本模型在3个中文短文本数据集与其他多个基线模型进行比较,实验结果表明,本模型在准确率方面优于其他基线模型。展开更多
文摘航行通告是民用航空情报领域的重要情报资料,针对中文航行通告专业名词较多、格式不统一及语义复杂等问题,提出了一种基于BERT-Bi-LSTM-CRF的实体识别模型,对航行通告E项内容中事件要素实体进行抽取。首先通过BERT(bidirectional encoder representations from transforms)模型对处理后的向量进行预训练,捕捉丰富的语义特征,然后传送至双向长短期记忆网络(bidirectional long short-term memory,Bi-LSTM)模型对上下文特征进行提取,最后利用条件随机场(conditional random field,CRF)模型对最佳实体标签预测并输出。收集并整理机场类航行通告相关的原始语料,经过文本标注与数据预处理,形成了可用于实体识别实验的训练集、验证集和评价集数据。基于此数据与不同的实体识别模型进行对比实验,BERT-Bi-LSTM-CRF模型的准确率为89.68%、召回率为81.77%、F_(1)为85.54%,其中F 1相比现有模型得到有效提升,结果验证了该模型在机场类航行通告中要素实体识别的有效性。
基金the National Natural Science Foundation of China(Grant Nos.61872163 and 61806084)China Postdoctoral Science Foundation project(2018M631872)Jilin Provincial Education Department project(JJKH20190160KJ).
文摘Mining outliers in heterogeneous networks is crucial to many applications,but challenges abound.In this paper,we focus on identifying meta-path-based outliers in heterogeneous information network(HIN),and calculate the similarity between different types of objects.We propose a meta-path-based outlier detection method(MPOutliers)in heterogeneous information network to deal with problems in one go under a unified framework.MPOutliers calculates the heterogeneous reachable probability by combining different types of objects and their relationships.It discovers the semantic information among nodes in heterogeneous networks,instead of only considering the network structure.It also computes the closeness degree between nodes with the same type,which extends the whole heterogeneous network.Moreover,each node is assigned with a reliable weighting to measure its authority degree.Substantial experiments on two real datasets(AMiner and Movies dataset)show that our proposed method is very effective and efficient for outlier detection.
文摘针对现有的基于异构图神经网络的短文本分类方法未充分利用节点之间的有效信息,以及存在的过拟合问题,文中提出基于门控双层异构图注意力网络的半监督短文本分类方法(Semi-Supervised Short Text Classification with Gated Double-Layer Heterogeneous Graph Attention Network,GDHG).GDHG包含节点注意力机制和门控异构图注意力网络两层.首先,使用节点注意力机制,训练不同类型的节点注意力系数,再将系数输入门控异构图注意力网络,训练得到门控双层注意力.然后,将门控双层注意力与节点的不同状态相乘,得到聚合的节点特征.最后,使用softmax函数对文本进行分类.GDHG利用节点注意力机制和门控异构图注意力网络的信息遗忘机制对节点信息进行聚集,得到有效的相邻节点信息,进而挖掘不同邻居节点的隐藏信息,提高聚合远程节点信息的能力.在Twitter、MR、Snippets、AGNews四个短文本数据集上的实验验证GDHG性能较优.