摘要
信息过滤是文本挖掘领域的重要研究内容之一。针对互动型网络媒体信息(如BBS),提出一种新的信息过滤算法,该算法主要从特征提取和分类器构造两方面对B ayesian方法进行改进。在对不良信息的特征提取过程中,根据网络论坛的特征,在计算中文不良信息特征项的权重时,根据关键词出现的位置、次数以及词长等建立一个特征评估函数,并用它来替换TF-IDF公式中的TF项;同时,考虑到网络论坛中的良性信息与不良信息之间的不平衡分布,采用一种不对称的学习策略来设计B ayesian分类器。实验结果及对比分析表明,该算法具有较高的过滤准确率。
Information filtering plays an important role in the text mining community. A novel Bayesian classification based information filtering algorithm which improves both feature selection and classification is presented. A new function is builded in term of occurrence,length,place and so on to replace the TF part of TF-IDF. At the same time the number of positive information is much fewer than that of harmful one. Hence,A new classification method was designed and it is called Asymmetric Naive Bays classifier. The results of experiments show that the filter designed gains a high accuracy.
出处
《广西师范大学学报(自然科学版)》
CAS
北大核心
2009年第3期134-137,共4页
Journal of Guangxi Normal University:Natural Science Edition
基金
国家自然科学基金资助项目(60773084
60603023)
教育部博士点基金资助项目(20070151009)
关键词
互动型网络媒体
不良信息
信息过滤
interactive network media
harmful information information filtering