摘要
针对固定资产管理过程中,由于资产设备命名的不规范,导致资产名不匹配、资产找不到、资产设备重复统计等问题,提出了一个基于余弦距离算法的文本相似度查询方案。文章分析了资产设备命名不规范的原因,通过对聚类分析常用的几种距离算法进行研究与比较,确定并通过程序实现了以余弦距离算法为基础的文本相似度查询的应用,对两种距离算法进行了测试,证明了余弦距离算法在固定资产管理系统中文本相似度查询中优势。
In the process of fixed assets management, because of non-standard naming of fixed assets and e- quipment, occur some problems, such as mismatches of fixed assets names, difficulty in finding fixed assets, and repetitive statistics of fixed assets. To solve these problems this article puts forward a text similarity query plan based on cosine distance algorithm. The article first analyzes the causes for non-standard naming of fixed as- sets, then studies and compares several distance algorithms commonly used in clustering analysis, and finally designs a program to implement text similarity query based on cosine distance algorithm. In addition, the article has tested the two distance algorithms, which proves the advantage of cosine distance algorithm for text similarity query.
出处
《无锡商业职业技术学院学报》
2013年第6期96-99,共4页
Journal of Wuxi Vocational Institute of Commerce
关键词
文本相似度
样本距离算法
欧式距离算法
余弦距离算法
text similarity query
sample distance algorithm
Euclid distance algorithm
cosine distance algorithm