摘要
该文基于朴素贝叶斯分类器对朝鲜语文本分类进行了研究。首先,利用基于类别选择的特征选择方法对朝鲜语文本进行特征选择,并使用类TF-IDF估算方法计算权重;其次,构造朴素贝叶斯分类器;最后,利用分类器实现对朝鲜语文本的分类。实验表明,该方法在朝鲜语文本分类中具有较好的效果,为朝汉结合文本分类提供了一定的依据。
Korean text categorization based on na ve bayesian classifier is studied in this paper.Firstly,features are selected by the category selection method,and weights are calculated by estimation method as TF-IDF;Secondly,the naive bayesian classifier is established;Finally,the classifier is applied to Korean text categorization.The experiment results show that the method has good performance on Korean text classification,and it provides certain basis for the classification of text with both Korean and Chinese.
出处
《中文信息学报》
CSCD
北大核心
2011年第4期16-19,共4页
Journal of Chinese Information Processing
基金
国家自然科学基金资助项目(69362001)