摘要
针对评论中蕴含的商品特征数目繁多且同一特征具有多种不同描述的情况,提出一种基于语义相似度的商品特征聚类算法。算法包括"分配"和"转移"两个过程。"分配"过程对特征词进行聚类得到初始簇序列;"转移"过程依次遍历初始簇序列将簇内可能存在的与其他簇语义相似度更高的特征词转移到对应的簇。实验结果表明该算法聚类质量高、时间复杂度小且对数据输入次序不敏感。
In light of the situation that in comments there are so many commodity feature numbers while multiple different descriptions are just for one feature,we proposed a semantic similarity-based commodity features clustering algorithm. The algorithm includes two processes,the 'allocation'and the 'transfer'. The'allocation'process clusters the feature words to get primary clusters sequence; and the'transfer'process traverses the primary clusters sequence in turn and transfers the feature words within a cluster which possibly have higher similarity in semantics as of other clusters to the corresponding cluster. Experimental results indicate that the algorithm has high quality,small time complexity and is insensitive to data input order.
出处
《计算机应用与软件》
CSCD
2016年第7期64-67,共4页
Computer Applications and Software
关键词
商品特征聚类
特征聚类
语义相似度
评论挖掘
Commodity features clustering
Features clustering
Semantic similarity
Comment mining