摘要
随着互联网的飞速发展,如何从海量的文本中获取有价值的信息成为一种新的挑战,中文文本分类作为自然语言处理的关键技术之一,能够实现对文本信息的分类和定位。笔者借助Sklearn库所提供的特征选择和特征权重计算方法,设计并实现了基于朴素贝叶斯算法的中文文本分类器。实验结果表明通过调整相应的参数该分类器可以获得较好的分类效果。
With the rapid development of Internet,how to obtain valuable information from massive text becomes a new challenge.Chinese text classification is one of the key technologies of natural language processing and can realize the text classification and location.With the helper of character selection and character weight calculation support by Sklearn library,this article designs and implements the Chinese text classification based on native Bayesian.Experiment results show that this text classification can obtain higher effect by adjusting the corresponding parameters.
作者
陆正球
王麟阁
周春良
Lu Zhengqiu;Wang Linge;Zhou Chunliang(College of Information Engineering,Ningbo Dahongying University,Ningbo Zhejiang 315175,China)
出处
《信息与电脑》
2018年第5期59-61,共3页
Information & Computer
基金
浙江省教育厅科研项目(项目编号:Y201738610)
关键词
朴素贝叶斯
文本分类
特征选择
Native Bayesian
text classification
character selection