摘要
介绍一种树状朴素贝叶斯(TAN)文本分类模型,对该模型存在的阈值选取问题进行实验分析,提出不需要进行阈值选取的TAN文本自动分类框架(ATAN)。在中英文非均匀类分布测试集上对基于ATAN的2种算法与手动选取阈值达到最优性能的BL-TAN进行对比,结果表明基于ATAN的算法具有更高性能。
This paper introduces a Tree-Augmented Na?ve Bayes(TAN) text categorization model,analyzes its problem of threshold selection,and proposes an Automatic TAN(ATAN) text categorization framework.Two algorithms based on ATAN are compared to the BL-TAN with the best classification performance at a specified threshold both on Chinese and English imbalanced datasets.Results show that algorithms based on ATAN have higher performance than BL-TAN.
出处
《计算机工程》
CAS
CSCD
北大核心
2010年第16期36-38,41,共4页
Computer Engineering
基金
高等学校博士学科点专项科研基金资助项目(2007004038)
关键词
文本分类
树状朴素贝叶斯模型
贝叶斯网络
text categorization
Tree-Augmented Naive Bayes(TAN)model
Bayesian network