期刊文献+
共找到1篇文章
< 1 >
每页显示 20 50 100
Non-Independent Term Selection for Chinese Text Categorization 被引量:2
1
作者 李景阳 孙茂松 《Tsinghua Science and Technology》 SCIE EI CAS 2009年第1期113-120,共8页
Chinese text categorization differs from English text categorization due to its much larger term set (of words or character n-grams), which results in very slow training and working of modern high-performance classi... Chinese text categorization differs from English text categorization due to its much larger term set (of words or character n-grams), which results in very slow training and working of modern high-performance classifiers. This study assumes that this high-dimensionality problem is related to the redundancy in the term set, which cannot be solved by traditional term selection methods. A greedy algorithm framework named "non-independent term selection" is presented, which reduces the redundancy according to string-level correlations. Several preliminary implementations of this idea are demonstrated. Experiment results show that a good tradeoff can be reached between the performance and the size of the term set. 展开更多
关键词 chinese text categorization term selection dimentionality
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部