摘要
面对海量的信息如何挖掘出有用的知识是当前研究的热点问题,对Web文本进行分类预处理,可在一定程度上解决此问题。针对Web文档的多主题特性,采用了多分类器模型,根据Web文档具有结构信息的特点,提出了系统的分类框架,对于短小文档采用Boosting和Web文档结构Bayesian分类模型,而对于长文档采用Boosting和综合Bayesian分类模型。实验结果表明,此分类框架具有较好的分类效果。
How to require the useful knowledge is becoming a hot topic. However we can solve this problem by classifying web text. Because web text is multi-topic, the multiply classifier is adopted and according to the structure character of web text, a system frame is provided: the combination of Boosting and Bayesian classifier based on web structure information is adopted to the short text, whereas the combination of boosting and synthesis Bayesian classifier is adopted to the long text. Finally the experiments show the classifier is effective.
出处
《计算机工程与设计》
CSCD
北大核心
2008年第23期6026-6028,共3页
Computer Engineering and Design
基金
上海高校优秀青年教师科研专项基金项目(B-8101-06-3802)。