期刊文献+

Research on Web Page Automatic Classification Based on Internet News Corpus

Research on Web Page Automatic Classification Based on Internet News Corpus
下载PDF
导出
摘要 Web pages contain more abundant contents than pure text ,such as hyperlinks,html tags and metadata et al.So that Web page categorization is different from pure text. According to Internet Chinese news pages, a practical algorithm for extracting subject concepts from web page without thesaurus was proposed, when incorporated these category-subject concepts into knowledge base, Web pages was classified by hybrid algorithm, with experiment corpus extracting from Xinhua net. Experimental result shows that the categorization performance is improved using Web page feature. Web pages contain more abundant contents than pure text ,such as hyperlinks,html tags and metadata et al.So that Web page categorization is different from pure text. According to Internet Chinese news pages, a practical algorithm for extracting subject concepts from web page without thesaurus was proposed, when incorporated these category-subject concepts into knowledge base, Web pages was classified by hybrid algorithm, with experiment corpus extracting from Xinhua net. Experimental result shows that the categorization performance is improved using Web page feature.
出处 《Journal of Shanghai Jiaotong university(Science)》 EI 2007年第6期731-735,共5页 上海交通大学学报(英文版)
基金 The National Natural Science Foundation of China(No60082003)
关键词 AUTOMATIC classification Web PAGES SUBJECT EXTRACTION automatic classification Web pages subject extraction
  • 相关文献

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部