摘要
传统的文本表示是在向量空间模型的基础上,采用特征选择方法降低文本的维数,这种方法认为文本中词语是相互独立的,没有考虑彼此之间的语义信息。文章提出一种新的基于语义特征选择的文本分类方法,在已有特征选择的基础上,利用词语之间的语义关联性,将那些与已选择的词语具有密切联系的词语加入词语特征空间。实验表明,该方法与已有的特征选择方法比较,提高了文本分类的精度。
Traditional text representation is based on the vector space model that uses the method of feature selection to reduce the dimension of the feature space. The words in the'text are considered to be mutually independent without any semantic information between them. In this paper, a new method of text categorization is proposed based on semantic feature selection. Based on the traditional feature selection and considering the semantic relatedness between the words, those words that have strong semantic relatedness with the traditionally selected ones are also added into the feature space. The experimental results show that compared with the traditional methods of feature selection, the proposed method in the paper improves the precision of text categorization.
出处
《合肥工业大学学报(自然科学版)》
CAS
CSCD
北大核心
2011年第10期1501-1504,共4页
Journal of Hefei University of Technology:Natural Science
基金
安徽省高校自然科学研究基金资助项目(KJ2010B168)
安徽省高校优秀人才青年基金资助项目(2010SQRL148
2010SQRL149ZD)
关键词
文本分类
向量空间模型
特征选择
语义关联
text categorization
vector space model
feature selection
semantic relatedness