摘要
本文应用有指导机器学习方法实现了一个分类器。运用改进型的MI进行特征提取,并对传统的TFIDF加权公式进行了改进。实验结果表明该分类器有较高的分类质量,满足了中文网页自动分类的要求。
Web-page classification plays an important role in data mining, and it is one of the key topics in information retrieval. This paper makes use of supervised machine learning theory to implement a Web-page classifier. The MI is used improved for feature extraction and improve traditional TFIDF formula. The experiment results show that the classifier is feasible and effective.
基金
安徽省高等学校省级自然科学研究重点项目(KJ2009A57)
关键词
网页分类
文本
算法
特征
Web-page classification
text
algorithm
feature