摘要
利用基于概率统计方法的贝叶斯算法,对报文内容分析系统中的垃圾信息进行过滤.该算法的实现,是通过一定的算法来分析大量的相关信息和不相关信息中多种单词特征出现的概率,从而建立一张单词特征表,并依据这张表借助一个评分系统来判断目标文件的相关性.本算法具有一定的自适应性。能根据垃圾信息不断变化出现的形式自动进行特征表更新,从而不断地实时提高本身的判断效率.
Based on probability and statistics theory, Bayesian method is applied to filter the garbage message from the contents of package message. Bayesian method will first analyzes large quantity of relative and irrelative messages to get the probability of words appearance. Then a word characteristic table will be built up, based on which the judgment of the relativity of the object file will be made through a grading system. This Bayesian method is of some self-adaptable ability, can automatically make word characteristic-table updated according to the variety of the "garbage message", and therefore could improve its own judgment efficiency in real time.
出处
《上海理工大学学报》
EI
CAS
北大核心
2008年第1期75-78,90,共5页
Journal of University of Shanghai For Science and Technology
关键词
报文分析
垃圾信息
过滤
贝叶斯算法
特征表
概率
package message analysis
garbage message
filter
Bayesian method
characteristic table
probability