期刊文献+

移动营销领域的文本相似度计算方法 被引量:6

Text similarity calculation method for mobile marketing
下载PDF
导出
摘要 针对移动营销文本中长度偏短、用词多变、语句残缺等问题,提出了在文本表示过程中采用word2vec进行词项加权语义映射的方法。首先在全语料库中采用word2vec训练词向量,对整体词向量进行聚类操作来汇聚相近语义词语形成语义簇特征空间,在文本向量化过程中,将词语与聚类中心的相似度和词语本身权重结合完成特征权值计算,向量化之后的文本采用欧氏距离计算相似度。将该算法应用于移动营销短文本测试集,通过K近邻(KNN)分类实验表明,该方法在分类性能上比基于词统计特征的方法在各类的F1值有平均6%的提升,能够更有效地衡量移动营销类别短文本的相似度。 In this paper, the authors proposed a weighted semantic mapping method based on word2 vec in the short text representation process, aiming at the shortness of text length, the variability of words and the incomplete sentences in mobile marketing text. Firstly, word2 vec was used in the whole corpus to train the word vector, and the whole word vector was clustered to form semantic cluster feature space by similar semantic words. In text vectorization process, feature weights were calculated using similarity between the word and the cluster center integrate with weight of the word itself. The similarity of the text after vectorization was calculated by Euclidean distance. The K Nearest Neighbor( KNN) classification experiments show that this method has a 6% improvement on average F1 value compared to word-based statistical method and is more effective in measuring the short text similarity of mobile marketing.
出处 《计算机应用》 CSCD 北大核心 2017年第A01期292-294,299,共4页 journal of Computer Applications
关键词 移动营销 短文本向量化 相似度计算 word2vec K近邻 mobile marketing short text vectorization similarity calculation word2vec K Nearest Neighbor(KNN)
  • 相关文献

参考文献7

二级参考文献81

共引文献133

同被引文献60

引证文献6

二级引证文献10

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部