摘要
分类问题是机器学习与数据挖掘研究中最重要的问题之一,其中文本自动分类是信息检索与数据挖掘领域的研究热点与核心技术,近年来得到了广泛的关注和快速的发展。设计了一种基于贝叶斯概率推理方法的垃圾邮件过滤系统。它用概率测试的权重来描述数据间的相关性,从而解决了数据间的不一致性,甚至是相互独立的问题。作为互联网的第一大应用,电子邮件一直受到广大网民的青睐,但近些年来,垃圾邮件问题日益严重。将上述研究的结果应用到目前互联网上垃圾邮件的过滤工作中,实验证明了方法的有效性。
Classification is one of the most important research fields in data mining and machine learning. In recent years, there have been extensive studies and rapid progresses in automatic text categorization, which is one of the hotspots and key techniques in information retrieval and data mining field. Designs a spam email filtering system hased on improved Bayesian probability inference. It uses weight from probability test to describe the correlativity of data;consequently it solves the inconsistent and mutual independent problems. Applies it to the spare filtering work on Internet, the validity of this system is proved. At last, some future directions of the research are given.
出处
《计算机技术与发展》
2008年第8期76-79,共4页
Computer Technology and Development
基金
国家自然科学基金资助项目(60273043)
安徽大学研究生创新基金资助项目(20073053)