摘要
通过研究文本特征选取中权重的计算问题,提出了一种利用特征词的熵函数加权的权值的计算方法,不但考察了特征词的文档频数,而且考察了它们在文档中出现的次数,使选出的特征子集更具有较好的代表性.实验表明,改进后的算法对聚类结果有了一定的改进.
By studying the text feature selection in the weight calculation problem,a calculation method of the word entropy weighted was proposed.Not only examines the characteristics of the document frequency,but also examines them in a document the number of occurrences.This selected feature subset is more good representation.Experiments show that the improved algorithm for clustering results have certain improvements.
出处
《甘肃联合大学学报(自然科学版)》
2012年第1期51-54,共4页
Journal of Gansu Lianhe University :Natural Sciences
关键词
特征选择
文档频
词频
feature selection
document frequency
word frequency