摘要
为了更加准确地分辨文本主题、过滤无关信息,在分析现有分类算法的基础上选取了SNoW作为基本主题分类算法,并提出了信息反馈和阈值过滤的策略以达到准确过滤无关信息的目的,提出了IG和CHI融合特征提取算法进一步提高系统的准确率。实验结果表明,采用信息反馈和阈值过滤以及融合特征提取的分类算法使系统更高效地实现了将相关信息准确地分类到其所属主题领域、过滤掉无关信息的双重功能。
In order to identify the text topic precisely and to filter out the irrelevant information, the multi-class classifier SNoW out of various classifiers as the basic topic-based classifier was selected.Moreover, the information-feedback combined with threshold-filteration strategy was put forward.The recommended feature abstraction, syncretizing IG and CHI as a whole, improvs the accuracy of the system.Comparison of experiments proves all the measures taken has implemented the expected categorization and filtering function with higher efficiency and effectiveness.
出处
《通信学报》
EI
CSCD
北大核心
2009年第S1期139-144,共6页
Journal on Communications
关键词
主题分类
信息反馈
阈值过滤
融合特征提取
topic-based classifier
information-feedback
threshold-filteration
feature abstraction