摘要
针对垃圾短信分类问题,提出一种计算词分类权重的方法,并以此为基础通过降维来得到分类特征词集合。提出了短信分类隶属度概念,通过计算短信分类隶属度和分类隶属度密度的方法来实现分类。为了提高分类的准确性,还对特征词进行了分类权重的迭代学习,从而保证了词分类权重取值的合理性。实验结果表明,该分类模型具有良好的分类效果和较低的时间复杂度。
A classifier model based on word terms was proposed to classify Spam Short Messages (SSM). The concept of word-category weight was introduced for representing a word effect of identifying the category a SSM belongs to and a method was put forward to calculate the word-category weight. Based on the word-category weight, a dimension reduction was carried out to get word items set. The Short message-Category Membership Value (SCMV) was used to illustrate how much a SSM belonged to a category, then a classifying algorithm was implemented by computing SCMV and SCMV density. To improve the accuracy of classification and make the word-category weight more reasonable, an word-weight iterative learning procedure was performed. The experimental results show that the proposed model is superior to other classification methods in terms of classification performance and time complexity.
出处
《计算机应用》
CSCD
北大核心
2013年第5期1334-1337,共4页
journal of Computer Applications
基金
国家级星火计划项目(2011GA690190)
关键词
垃圾短信
特征词
文本分类
降维
权重学习
spam short message
word term
text classification
dimensionality reduction
weight learning