摘要
针对Web文本的特征提取方法多种多样,但均存在各自的不足且对数据集偏斜问题普遍没有很好的解决能力,针对该问题采用BNS特征提取算法和Odds特征提取算法基于数据融合思想进行Web文本特征提取并用支持向量机进行分类。在保持BNS算法对于数据集偏斜问题的解决能力的基础上用Odds算法提高BNS算法的精确度。实验结果表明,用数据融合思想将这两种方法结合可以有效弥补两种方法各自的不足,并能提高分类准确率。
There are a lot of methods to select features of Web text, but they all have their own disadvantages, to solve this problem, the features of web text are selected by BNS metrics and Odds metrics based on data fusion and are classified by support vector machines. We can let Odds metrics improve the precision of BNS metrics based on keeping BNS metrics's capability of resolving class skew. The experimental results indicate that combining these based on data fusion can greatly improve disadvantages of each method and improve the precision of text classify.
出处
《计算机工程与设计》
CSCD
北大核心
2009年第10期2529-2532,共4页
Computer Engineering and Design