摘要
为有效挖掘故障短文本中跨文本的词汇间关联,构建故障实体节点的全局特征表示,从而获取故障实体节点聚类标签,提出一种面向故障短文本的改进图节点嵌入与聚类方法。该方法首先在图结构构建过程中创新边权重计算方法,用以区分同一窗口下不同距离的词汇间关联;其次改进图节点结构特征获取方法,从而体现节点度值差异对嵌入的影响;通过融合节点的结构特征与关系特征,增强具有相似邻居节点的同类节点之间的相似性表现;在聚类阶段设计备选节点数参数以缓解截断距离的敏感性。该方法在公开数据集和真实业务数据上进行了参数分析和性能评估,结果表明该方法可获取精准有效的故障实体节点聚类结果。
To effectively mine the cross-text vocabulary association in fault short text,the global feature representation of fault entity nodes was constructed,and the fault entity node clustering label was obtained.An improved graph node embedding and clustering method for fault short text was proposed.In this method,the calculation method of edge weight was innovated in the process of graph construction to distinguish the association between words with different distances under the same window.The graph node structure feature acquisition method was improved to reflect the influence of node value differences on embedding.Then,the structural features and relational features of nodes were fused to enhance the similarity between nodes with similar neighbor nodes.In the clustering stage,a parameter called alternative nodes number was designed to alleviate the sensitivity of cut-off distance.The parameter analysis and performance evaluation were carried out on the open data set and real business data,and the results showed that the proposed method could obtain accurate and effective clustering results of fault entity nodes.
作者
邱竞雄
孙林夫
韩敏
QIU Jingxiong;SUN Linfu;HAN Min(School of Computing and Artificial Intelligence,Southwest Jiaotong University,Chengdu 610031,China;Manufacturing Industry Chains Collaboration and Information Support Technology Key Laboratory of Sichuan Province,Chengdu 610031,China)
出处
《计算机集成制造系统》
EI
CSCD
北大核心
2023年第12期4256-4266,共11页
Computer Integrated Manufacturing Systems
基金
国家重点研发计划资助项目(2018YFB1701500,2018YFB1701502)。
关键词
故障短文本
图节点嵌入
局部密度
图节点聚类
fault short text
graph node embedding
local density
graph node clustering