摘要
根据Web页面标记建立标记树,通过分析,保留有用信息的标记子树,达到获取页面主要内容,净化页面的效果.
It's the key problem that how to get the information people need of the internet through the computer. An arithmetic is put forward to solve this problem. At first a tag tree of the web page is constructed, then the authors divide the web page into several parts as Main part, Site flag, Navigation bar, Communication part, Copyrights, and the tag tree tells the relationship of these parts. The authors can parse the tag tree, get the child tag tree that only tells the Main part. So the main part is obtained and the web page is distilled.
出处
《西南师范大学学报(自然科学版)》
CAS
CSCD
北大核心
2006年第5期128-131,共4页
Journal of Southwest China Normal University(Natural Science Edition)
关键词
标记树
标记树模式
页面净化
tag tree
tag tree model
web page distillation