摘要
总结了文本分类的若干统计特性,然后以间隔最大化原理为基本目标,直接利用支持向量机的分类原理对自动文本分类任务的机器学习本质进行了探索,从平均文挡映射出发得到ADM-FSM原理模型,以定理的形式给出了对基于间隔最大化自动文本分类器泛化能力的估计,最后通过在标准测试数据集上的实验,验证了这些理论成果。
Some statistical characteristics of text classification are summaried, taking margin maximization theory as the basic objection, machine learning essence of automatic text classification tasks is explored by making use of classification theory of support vector machine directly, the ADM-FSM model is obtained, which are derived from average document mapping, and the generalization capability estimation of automatic text classification based on margin maximization in the form of theorems is given. Finally, these theoretical achievements are testified by experiment on the standard data set.
出处
《计算机工程与设计》
CSCD
北大核心
2006年第12期2169-2171,2206,共4页
Computer Engineering and Design
基金
上海市教育委员会科研基金项目(04EB12)
关键词
机器学习
文本分类
间隔最大化
支持向量机
平均文档映射
machine learning
text classification
margin maximization
support vector machine
average document mapping