摘要
随着网络环境的愈加复杂,用户数量和种类显著增加,网络信息更新频繁。针对文本数据自身较稀疏、不规范等特点,提出了基于改进的局部序列比对算法的用户会话聚类新方法。首先通过计算用户会话集成距离方法衡量会话的相似度;然后,采用改进的基于用户会话距离的序列比对算法对话题进行聚类,该算法改善了传统用户聚类算法的不足。实验表明,该算法较传统聚类算法在召回率和准确率方面有明显改善。
With the increasingly complex network environment,the number and types of users have increased significantly,and network information has been updated frequently.In view of the characteristics of sparse and non-standard text data itself,a new user session clustering method based on improved local sequence alignment algorithm is proposed.Firstly,the similarity of the session is measured by calculating the distance of the user’s session integration;then,an improved sequence alignment algorithm based on the user’s session distance is used to cluster the topics.This algorithm improves the shortcomings of the traditional user clustering algorithm.Compared with the traditional clustering algorithm,experiments show that this algorithm has significantly improved the recall rate and accuracy rate.
作者
姚瑶
周铜
YAO Yao;ZHOU Tong(School of Information Engineering, Zhengzhou University of Technology, Zhengzhou, Henan 450044, China)
出处
《中州大学学报》
2021年第1期114-119,共6页
Journal of Zhongzhou University
基金
河南省科技厅科技攻关项目(182102310982,202102210156)。
关键词
局部序列比对
用户(会话)相似度
聚类
local sequence alignment
user(conversation)similarity
clustering