摘要
随着互联网新兴技术的蓬勃发展,报社稿件冗杂,人工分类成本高、易出错,无法高效地将稿件进行准确分类。鉴于此,采用了机器学习中较为经典的朴素贝叶斯分类算法,设计出了基于该算法的新闻文本分类系统。此系统主要包括新闻文本预处理、特征概率计算、朴素贝叶斯分类算法的分析与实现以及测试样本数据准确性这几个过程。通过实验证明,用部分朴素贝叶斯算法计算得出的概率值大的特征词与财经类之间的关联性越强,其分类的结果就更加准确,说明所采用的特征提取和概率计算的方法是有效的。本系统采用腾讯新闻为样本的新闻数据来完成系统新闻处理和测试,可用于报社新闻分类,提高分类的准确性,具有更好的分类效果。
With the booming development of emerging technologies on the Internet,newspaper articles are redundant,and manual classification is costly,error-prone,and unable to efficiently and accurately categorize the articles.In view of this,the more classical plain Bayesian classification algorithm in machine learning is adopted,and a news text classification system based on this algorithm is designed.This system mainly includes the processes of news text preprocessing,feature probability calculation,analysis and implementation of the simple Bayesian classification algorithm,and testing the accuracy of sample data.Through experiments,it is proved that the stronger the correlation between the feature words with large probability values calculated by the Partial Plain Bayesian algorithm and the finance and economics category,the more accurate the results of their classification will be,which indicates that the adopted methods of feature extraction and probability calculation are effective.This system uses Tencent news as the sample news data to complete the system news processing and testing,this system can be used for newspaper news classification to improve the accuracy of classification and have better classification results.
作者
孙亮
SUN Liang(School of Digital Media,Lanzhou University of Arts and Science,Lanzhou 73000,China)
出处
《信阳农林学院学报》
2023年第3期108-111,共4页
Journal of Xinyang Agriculture and Forestry University
关键词
文本分类技术
新闻分类
朴素贝叶斯
text classification technology
news classification
naive bayes