摘要
针对近年来引起大众广泛关注的食品添加剂问题,以食品包装后的配料表为数据依据,分析用来话题发现的聚类算法,选定Single-Pass聚类算法作为话题发现的基本算法。并在其基础上,针对Single-Pass算法精度上的不足,利用食品主成分和配料构建双向量来共同表示,提出"代"的概念进一步地提高聚类的精准程度。最后,通过实验证明,该算法在正确率和时间效率上都能满足话题发现的功能。
Food problems caused widespread public concern, using the comments of food additives as a research repository, analyzing and comparing of five topics detection clustering algorithm. Then, the single-pass algorithm as the basic method, which is simple, scalable and suitable for network topics detection. On the basis of it, aiming at the deficiency of the single-pass algorithm precision, using test headings and text synchronization build two-way volume to represent test. Besides, the method put forward the concept of "generation" to further improve the accuracy of clustering. Finally, experiments show that the algorithm can satisfy the function of topic detection in accuracy and time efficiency.
出处
《计算机与应用化学》
CAS
2015年第6期739-743,共5页
Computers and Applied Chemistry
基金
2013年宁夏回族自治区自然科学基金(项目编号:NZ13053)
关键词
话题发现
添加剂
算法
聚类
topic detection
additive
single-pass algorithm