摘要
设计一个针对网络不良信息的智能信息过滤模型,通过对特征提取常用方法的研究,选择文档频率阈值、x2-统计量和人工提取相结合的方法来进行特征提取,并利用VSM方法来表示文本;同时提出创建主词典和同、近义词两个词典来作为特征词典,既降低向量空间维数,又提高了特征提取的准确率,利用KSOM网络来训练文本分类机。
Designs an adaptive webpages information filtering model for the harmful information in the webpages, selects a document frequency (DF), x2-statistic extraction and the combination of manual extraction methods for feature extraction, and expresses the texts by Vector Space Model (VSM). At the same time, proposes to create major dictionary and synonym and near-synonym dictionary as characteristic word dictionary. In this way, it can not only reduce the dimension of vector space, but also increase rate of accuracy in feature extraction, achieves the feature selection of the training texts, and trains a text automatic classification based on KSOM.
出处
《现代计算机》
2009年第9期18-21,共4页
Modern Computer
关键词
信息过滤
神经网络
文本分类
Information Filtering
Artificial Neural Networks
Text Categorizing