Although, researchers in the ATC field have done a wide range of work based on SVM, almost all existing approaches utilize an empirical model of selection algorithms. Their attempts to model automatic selection in pra...Although, researchers in the ATC field have done a wide range of work based on SVM, almost all existing approaches utilize an empirical model of selection algorithms. Their attempts to model automatic selection in practical, large-scale, text classification systems have been limited. In this paper, we propose a new model selection algorithm that utilizes the DDAG learning architecture. This architecture derives a new large-scale text classifier with very good performance. Experimental results show that the proposed algorithm has good efficiency and the necessary generalization capability while handling large-scale multi-class text classification tasks.展开更多
Since webpage classification is different from traditional text classification with its irregular words and phrases,massive and unlabeled features,which makes it harder for us to obtain effective feature.To cope with ...Since webpage classification is different from traditional text classification with its irregular words and phrases,massive and unlabeled features,which makes it harder for us to obtain effective feature.To cope with this problem,we propose two scenarios to extract meaningful strings based on document clustering and term clustering with multi-strategies to optimize a Vector Space Model(VSM) in order to improve webpage classification.The results show that document clustering work better than term clustering in coping with document content.However,a better overall performance is obtained by spectral clustering with document clustering.Moreover,owing to image existing in a same webpage with document content,the proposed method is also applied to extract image meaningful terms,and experiment results also show its effectiveness in improving webpage classification.展开更多
文摘Although, researchers in the ATC field have done a wide range of work based on SVM, almost all existing approaches utilize an empirical model of selection algorithms. Their attempts to model automatic selection in practical, large-scale, text classification systems have been limited. In this paper, we propose a new model selection algorithm that utilizes the DDAG learning architecture. This architecture derives a new large-scale text classifier with very good performance. Experimental results show that the proposed algorithm has good efficiency and the necessary generalization capability while handling large-scale multi-class text classification tasks.
基金supported by the National Natural Science Foundation of China under Grants No.61100205,No.60873001the HiTech Research and Development Program of China under Grant No.2011AA010705the Fundamental Research Funds for the Central Universities under Grant No.2009RC0212
文摘Since webpage classification is different from traditional text classification with its irregular words and phrases,massive and unlabeled features,which makes it harder for us to obtain effective feature.To cope with this problem,we propose two scenarios to extract meaningful strings based on document clustering and term clustering with multi-strategies to optimize a Vector Space Model(VSM) in order to improve webpage classification.The results show that document clustering work better than term clustering in coping with document content.However,a better overall performance is obtained by spectral clustering with document clustering.Moreover,owing to image existing in a same webpage with document content,the proposed method is also applied to extract image meaningful terms,and experiment results also show its effectiveness in improving webpage classification.