摘要
针对中文网页分类问题该文设计了一种新的基于代表样本动态生成的分类算法。算法通过对原始训练样本集的训练逐个生成代表样本,并充分利用被裁剪训练样本的有效信息,对已生成的代表样本进行多次调整,从而使代表样本更具有代表性。基于该算法的中文网页分类器的实验结果表明,算法有效地压缩了原始训练样本集,提高了分类效率,同时保持了分类的准确性;具有较好的分类性能。
A new algorithm based on representative samples dynamical generation for Chinese Web page classification was proposed In this paper. The method generated representative samples through training the original samples; and then made the best of helpful information from every sample which was cut out to adjust the representative samples repeatedly in order to enhance the representativeness. Through the experiment with the Chinese Web classifier based on this algorithm, it shows that this algorithm can compress the original training corpus effectively so that classification efficiency can be improved substantially; meanwhile, this algorithm maintains the accuracy and has a better classification performance.
出处
《计算机应用》
CSCD
北大核心
2006年第10期2502-2504,共3页
journal of Computer Applications
基金
国家自然科学基金资助项目(60204009)
国家973规划项目(2004CB318109)
中科院复杂系统与智能科学重点实验室开放基金(20040104)
关键词
K-近邻
代表样本
调整
k-Nearest Neighbor
representative samples
adjustment