摘要
提出了一种改进的基于特征提取的二级文本分类方法.通过提取出文本的特征项并计算其权重值,将文本表示成由特征项和权重值组成的向量,利用向量的夹角余弦计算二级分类模型下文本之间的相似度,可以更准确快速地定位海量信息.实验结果表明本文提出的分类方法的准确率优于传统的类中心分类法,提高了系统的适应性和分类能力.
An improved two-level text classification method is proposed,based on feature extraction.First,the characteristics of the text were extracted,and the weights were calculated.Then,the text was represented as a vector composed of characteristics and weight value.The vector angle cosine was used to calculate the similarity among the text so as to position the vast amount of information more accurately and rapidly.The experimental results show that the proposed classification method is superior to the existing center classification method in accuracy of classification,improving the adaptability and classification ability of the system.
出处
《广东工业大学学报》
CAS
2012年第4期65-68,共4页
Journal of Guangdong University of Technology
基金
广东省教育部产学研合作资助项目(2011A090200068)
广东省自然科学基金资助项目(9151009001000043)
关键词
文本分类
特征提取
向量空间模型
KNN算法
text classification
feature extraction
vector space model
KNN algorithm