摘要
本文对广义向量空间模型进行了改进,并利用《知网》义原提出了一种基于义原空间的文本相似度计算方法。此方法根据TF-IDF权重,将文中特征项转化为义原空间中的向量,通过求义原向量之间的夹角余弦值的方式,实现文本相似度的计算。最后进行文本聚类对比实验,结果表明,该方法可以很好地解决舆情分析中的语义漂移问题,使得网络舆情分析的效果有了较大提高。
We improve the generalized vector space model, and present an original meaning space based text similarity computing method with the "HowNet" sememe. The method converts a feature into a vector in a sememe space with TF- IDF weight, and calculates text similarity by the cosine of the angle between different sememe vectors. We eventually perform text clustering and comparative experiments. Experimental results show that the method can well solve the problem of semantic drift of public opinion analysis, and its effect can be significantly improved.
出处
《山东科学》
CAS
2014年第6期73-77,共5页
Shandong Science
基金
烟台市社会科学规划研究项目(2012-SH-11)
关键词
网络舆情
知网
相似度
internet public opinion
HowNet
similarity