摘要
为了能准确挖掘用户兴趣点,首先利用概率潜在语义分析PLSA模型将"网页-词"矩阵向量投影到概率潜在语义向量空间,并提出"自动相似度阈值选择"方法得到网页间的相似度阈值,最后提出将平面划分法与凝聚式层次聚类相结合的凝聚式层次k中心点HAK-medoids算法,实现用户兴趣点聚类。实验结果表明,与传统的基于划分的算法相比,HAK-medoids算法聚类效果更好。同时,提出的用户兴趣点聚类技术在个性化服务领域可提高个性化推荐和搜索的效率。
To mine user's interests accurately, probabilistic latent semantic analysis (PLSA) model is firstly used to project webpage-word matrix vector into probabilistic latent semantic vector space. A method of "auto-selected similarity threshold" is proposed to get web pages similarity threshold. At last, combined with divisiory algorithms and hierarchical agglomerative clustering, a hierarchical agglomerative k-medoids clustering algorithm is proposed to realize cluster user's interests. The experimental results show that, compared with the traditional divisiory algorithms, the hierarchical agglomerative k- medoids algorithm has a better clustering effect. Furthermore, user's interest clustering technique can improve the efficiency of personalized recommendation and search in user' personalized service fields. Key words.probabilistic latent semantic analysis; auto-selected similarity threshold; user's interest
出处
《计算机工程与科学》
CSCD
北大核心
2014年第4期765-771,共7页
Computer Engineering & Science
基金
国家自然科学基金资助项目(61103129)
江苏省科技支撑计划资助项目(BE2009009)
关键词
概率潜在语义分析
自动相似度阈值选择
用户兴趣点
凝聚式层次k中心点
个性化服务
probabilistic latent semantic analysis
auto-selected similarity threshold
user's interestpoints
hierarchical agglomerative k-medoids
personalized service