摘要
本文提出了一种基于标题类别语义识别的文本分类算法。算法利用基于类别信息的特征选择策略构造分类的特征空间,通过识别文本标题中的特征词的类别语义来预测文本的候选类别,最后在候选类别空间中用分类器执行分类操作。实验表明该算法在有效降低分类候选数目的基础上可显著提高文本分类的精度,通过对类别空间表示效率指标的验证,进一步表明该算法有效地提高了文本表示空间的性能。
This paper presents a new algorithm using title category semantic recognition for text categorization. The algorithm generates feature space based on its category, picks up category semantic words of the title to produce candidate category and finally classifies it under these candidate categories. The experimental results firmly prove that the new algorithm performs better with fewer candidates and higher precision. Further research introduces category space representation efficiency to verify the validity of the new algorithm and proves that it can achieve great improvement in text representation.
出处
《电子与信息学报》
EI
CSCD
北大核心
2007年第12期2885-2890,共6页
Journal of Electronics & Information Technology
基金
国家自然科学基金(60435020
60504021)资助课题
关键词
标题类别语义识别
候选类别
类别空间表示效率
Title Category Semantic Recognition(TCSR)
Candidate category
Category space representation efficiency