Research on Web Page Automatic Classification Based on Internet News Corpus

Research on Web Page Automatic Classification Based on Internet News Corpus

下载PDF

导出

摘要 Web pages contain more abundant contents than pure text ,such as hyperlinks,html tags and metadata et al.So that Web page categorization is different from pure text. According to Internet Chinese news pages, a practical algorithm for extracting subject concepts from web page without thesaurus was proposed, when incorporated these category-subject concepts into knowledge base, Web pages was classified by hybrid algorithm, with experiment corpus extracting from Xinhua net. Experimental result shows that the categorization performance is improved using Web page feature. Web pages contain more abundant contents than pure text ,such as hyperlinks,html tags and metadata et al.So that Web page categorization is different from pure text. According to Internet Chinese news pages, a practical algorithm for extracting subject concepts from web page without thesaurus was proposed, when incorporated these category-subject concepts into knowledge base, Web pages was classified by hybrid algorithm, with experiment corpus extracting from Xinhua net. Experimental result shows that the categorization performance is improved using Web page feature.

作者蔡巍王永成尹中航

机构地区 Dept. of Computer Science & Eng.

出处《Journal of Shanghai Jiaotong university(Science)》 EI 2007年第6期731-735,共5页 上海交通大学学报（英文版）

基金 The National Natural Science Foundation of China(No60082003)

关键词 AUTOMATIC classification Web PAGES SUBJECT EXTRACTION automatic classification Web pages subject extraction

分类号 TP391.1 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

1Liang Dan xi 1, Liu Gang 1, Li En min 1,Peng Xue hua 2 1.Department of Computer Science and Engineering, Tongji University, Shanghai 200092, China,2.E Commerce Application Development Architect, Professional Service Group, Sun MicroSystems.Database-Based Web Page[J].Wuhan University Journal of Natural Sciences,2001,6(Z1):443-447. 被引量：2
2李晓黎,史忠植.Innovating Web Page Classification Through Reducing Noise[J].Journal of Computer Science & Technology,2002,17(1):9-17. 被引量：4
3王永成.Construction of Cubic Dynamic and User-oriented Taxonomy for Automatic Classification of Internet Information[J].High Technology Letters,2001,7(3):42-45. 被引量：1
4Osondu C. Unegbu.A Re-investigation of the Concept of Word Classes Through a Categorization Approach[J].Journal of Literature and Art Studies,2014,4(11):990-999.
5Yun Xu.On Prototypes from the Perspective of Semantics and Categorization[J].International English Education Research,2015(3):147-149.
6宋聚平,Wang,Yongcheng,YIN,Zhonghang,Zeng,Yuming.Evaluation of the Importance of Web Pages[J].High Technology Letters,2001,7(4):23-26.
7彭世新,董丽.一种Schema驱动生成用户界面的技术和实现[J].计算机应用研究,2006,23(3):231-233. 被引量：2
8杨洁,安建成.关于Web Page的语义挖掘研究[J].电脑开发与应用,2011,24(2):13-15.
9赵启军,卢宏涛,蒋晓华.Web Page Watermarking for Tamper-Proof[J].Journal of Shanghai Jiaotong university(Science),2005,10(3):280-284. 被引量：2
10XIA Liegang,LUO Jiancheng,WANG Weihong,SHEN Zhanfeng.Land cover automatic classification based on RS-Informatic Tupu[J].遥感学报,2014,18(4):788-803. 被引量：2

Journal of Shanghai Jiaotong university(Science)

2007年第6期

浏览历史

内容加载中请稍等...

Research on Web Page Automatic Classification Based on Internet News Corpus

相关作者

相关机构

相关主题

浏览历史