摘要
而文本自动分类,作为一种有效的提高文本检索速度和准确率的方法,在电子文本信息管理中起着非常重要的作用。KNN算法作为一种非常简单,但是有效的文本分类算法,被广泛运用。针对传统KNN算法中对特征项的非监督权重分配的不足之处做了改进,采取x2统计量方法和信息增益这两种监督权重分配方法,有效地利用了训练集标签信息,提高了KNN算法的精确度。
How to effectively manage digital text information thus becomes one of the major issues in the domain of information technology. Data classification, as an effective method for improving the speed and accuracy of text data retrieval,plays a vital role in digital text information management. KNN,as a very simple yet effective approach,is widely adopted in practical application. The paper focuses on the improvement of traditional unsupervised term weighting in KNN owing to the adoption of Chi-Square and information gain approach,this approach could make full use of the training data label and thus improve the precision of the KNN method.
出处
《信息安全与通信保密》
2011年第4期38-39,43,共3页
Information Security and Communications Privacy
关键词
KNN
文本分类
权重分配
KNN
data classification
term weighting