摘要
地理位置作为用户生活轨迹的具体表现,在人群分类中有着举足轻重的作用。地理位置数据具有高维稀疏性,已有人群分类方法需对位置数据进行特征选择并提前确定特征数,实际应用中存在不便。针对该问题,提出基于地理位置人群分类的一种非参数聚类方法。该方法首先利用分层狄利克雷过程(Hierarchical Dirichlet Process,HDP)无监督学习出最佳特征个数;然后利用潜在狄利克雷分布(Latent Dirichlet Allocation,LDA)对位置数据进行特征选取,同时得到功能特征概率矩阵;最后将其作为聚类权向量计算用户间的相似度,利用亲和力聚类(Affinity Propagation,AP)实现人群分类。实验结果表明,该方法较传统方法消耗时间更少、占用内存更低,且同时具有较高的F-measure。
Geographical location as the manifestation of user's life, has a pivotal role in the group classification.Due to geographical location data has high-dimensionaI sparse, the existing classification method must be select feature and determine the characteristics of number in advance, which exist in practical application more inconvenience.To solve this problem, a non-parametric clustering method based on group classification of geographic location was presented. Firstly, use Hierarchi- cal Dirichlet Process unsupervised learning features of the best number: Secondly,use Latent Dirichlet Allocation to fea- ture selection, at the same time get the feature probability matrix; Finally, use it as a clustering weight vector to calculate the similarity between users, using Affinity Propagation implementation group classification. The experimental results show that the method spends less time and less memory,and at the same time with high F-measure.
出处
《软件导刊》
2017年第2期7-10,共4页
Software Guide
关键词
地理位置
人群分类
分层狄利克雷过程
潜在狄利克雷分布
亲和力聚类
Geographical Location
Group Classification Hierarchical Dirichlet Process
Latent Dirichlet Allocation
Affinity Propagation