摘要
传统文本分类算法,在特征选择这一阶段,采用统计观点和方法机械处理词语与类别的联系,假定词语之间相互独立,忽略特征关键词之间的语义关系。本文提出一种新的特征选择方法,用基于上下文统计的词汇相关度方法,计算特征词之间的词汇相关度,设定相关度阀值,进行特征选择。降低了特征空间的高维稀疏性,并有效的减少噪声,提高了分类精度和算法效率。
Traditional text classification algorithms,on the stage of feature selection,use statistical point and methods handle the links between words and categories,and assume that words are independent,ignore the semantic relationships between keywords.This paper presents a new feature selection method,and use lexical relatedness based on the context of statistics,calculate the words’lexical relatedness and set the relevant threshold values for feature selection.Reduce the scarcity of high dimensional feature space,and effectively reduce noise,improve the classification accuracy and efficiency of the algorithm.
出处
《网络安全技术与应用》
2012年第5期33-34,40,共3页
Network Security Technology & Application
关键词
文本分类
特征选择
词汇相关度
Text Categorization
Feature Selection
Lexical Relatedness