期刊文献+
共找到1篇文章
< 1 >
每页显示 20 50 100
A Method of Eliminating Noises in Web Pages by Style Tree Model and Its Applications 被引量:2
1
作者 ZHAOCheng-li YIDong-yun 《Wuhan University Journal of Natural Sciences》 EI CAS 2004年第5期611-616,共6页
A Web page typically contains many information blocks. Apart from the main content blocks, it usually has such blocks as navigation panels, copyright and privacy notices, and advertisements. We call these blocks the n... A Web page typically contains many information blocks. Apart from the main content blocks, it usually has such blocks as navigation panels, copyright and privacy notices, and advertisements. We call these blocks the noisy blocks. The noises in Web pages can seriously harm Web data mining. To the question of climinating these noises, we intro duce a new tree structure, called Style Tree, and study an algorithm how to construct a site style tree. The Style Tree Model is employed to detect and climinate noises in any Web pages of the site. An information based measure to determine which element node is noisy is also constructed. In addition, the applications of this method are discussed in detail. Experimental results show that our noises climination technique is able to improve the mining results significantly. Key words noises climination - DOM tree - style tree - Web mining CLC number TP 339 Foundation item: Supported by the National Natural Science Foundation of China (60003013)Biography: ZHAN Cheng-li (1979-), male, Master candidate, research direction: Intelligent Information System. 展开更多
关键词 noises climination DOM tree style tree Web mining
下载PDF
上一页 1 下一页 到第
使用帮助 返回顶部