期刊文献+

Web网页信息文本分类的研究 被引量:5

Research on web text categorization
下载PDF
导出
摘要 面对海量的信息如何挖掘出有用的知识是当前研究的热点问题,对Web文本进行分类预处理,可在一定程度上解决此问题。针对Web文档的多主题特性,采用了多分类器模型,根据Web文档具有结构信息的特点,提出了系统的分类框架,对于短小文档采用Boosting和Web文档结构Bayesian分类模型,而对于长文档采用Boosting和综合Bayesian分类模型。实验结果表明,此分类框架具有较好的分类效果。 How to require the useful knowledge is becoming a hot topic. However we can solve this problem by classifying web text. Because web text is multi-topic, the multiply classifier is adopted and according to the structure character of web text, a system frame is provided: the combination of Boosting and Bayesian classifier based on web structure information is adopted to the short text, whereas the combination of boosting and synthesis Bayesian classifier is adopted to the long text. Finally the experiments show the classifier is effective.
出处 《计算机工程与设计》 CSCD 北大核心 2008年第23期6026-6028,共3页 Computer Engineering and Design
基金 上海高校优秀青年教师科研专项基金项目(B-8101-06-3802)。
关键词 WEB文本分类 多主题 多分类器 BOOSTING算法 综合Bayesian分类法 web text categorization multi-topic multiply classifier boosting synthesis Bayesian classifier
  • 相关文献

参考文献8

  • 1Chia-Hui Chang, Mohammed Kayed, Moheb Ramzy Girgis. A survey of web information extraction systems[J].IEEE Transactions on Knowledge and Data Engineering,2006,18(10): 1411- 1428.
  • 2Laender A H F, Ribeiro-Neto B A, Silva A S da, et al. A brief survey of web data extraction tools[J].SIGMOD Record,2002,31 (2):84-93.
  • 3范焱,郑诚,王清毅,蔡庆生,刘洁.用Naive Bayes方法协调分类Web网页[J].软件学报,2001,12(9):1386-1392. 被引量:53
  • 4李杨,曾海泉,刘庆华,胡运发.基于kNN的快速WEB文档分类[J].小型微型计算机系统,2004,25(4):725-729. 被引量:13
  • 5Wang Zi-Qiang,Stm Xia,Zhang De-Xian,et al.An optimal SVM- Based Text Classification Algorithm [C]. International Conference on Machine Leaming and Cybernetics, 2006:1378-1381.
  • 6Lewis D D. Naive (Bayes) at forty: The independence assumption in information retrieval[C].Chemnitz, Germany: Proceedings of 10th European Conference on Machine Learning, 1998.
  • 7肖江,张亚非.Boosting算法在文本自动分类中的应用[J].解放军理工大学学报(自然科学版),2003,4(2):25-28. 被引量:7
  • 8崔林,付克明,石生树,宋瀚涛.基于Boosting机制的Naive Bayesian文本分类器[J].计算机工程与应用,2005,41(8):31-33. 被引量:3

二级参考文献21

  • 1[1]Yang Y and Liu X. A re-examination of text categorization methods[C]. In: Proceedings of ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 1999, 42~49.
  • 2[2]Dasarathy B V. Neatest neighbor(NN) norms: NN pattern classification techniques[C]. Los Alamitos, CA:IEEE Computer Society Press, 1991.
  • 3[3]Harrt P E. The condensed nearest neighbor rule[J]. IEEE Trans. Information Theory ,May 1968,IT-14(3):515~516.
  • 4[4]Dasarathy Y, Minimal B V. Consistent set (MCS) identification for optimal nearest neighbor decision system terms design[J]. IEEE Trans. Syst. Man Cybern. ,March 1994,24(3):511~517.
  • 5[5]Kuncheva L I. Fitness functions in editing K-NN reference set by genetic algorithms[J]. Pattern Rcognition,1997,30(6):1041~1049.
  • 6[6]Zhong Hong-bin, Sun Guang-yu. Optimal selection of & Technology, May 2001,16(2): 126~136.reference set for the nearest neighbor classification by Tabu search[J]. Journal of Computer Science
  • 7[7]Masand B, Linoff G and Waltz D. Classifying news stories using memory-based reasoning[C]. In: International ACM SIGIR Conference on Research and Development in Information Retrieval, 1992, 59~65.
  • 8[8]Yang Y. Expert network: effective and efficient learning from human decisions in text categorization and retrieval[C]. In:Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'94) 1994:11~21.
  • 9[9]Iwayama M and Tokunaga T. Cluster-based text categorization: a comparison of category search strategies[C]. In: Proceedings of the 18th Ann Int ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'95), 1995, 273~281.
  • 10[10]Yang Y and Pederson J. Feature selection in statistical learning of text categorization[C]. In: ICML-97, 1997,412~420.

共引文献72

同被引文献17

引证文献5

二级引证文献6

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部