摘要
页面分块在文档分类,信息抽取,主题信息采集,以及搜索引擎优化等方面具有重要的作用。首先提出了一种基于Web标准的页面分块算法,通过对网页进行解析和布局分析,利用Web标准对网页进行分块。实验证明该算法在对遵循Web标准的网页进行分块时,在分块准确性和复杂页面适应性方面得到了提高。
Web page segmentation plays an important role in the document classification, information extraction, topic information collection, as well as search engine optimization. In this paper, we use the web standard to propose a web standard based web page segmentation algorithm, through the pages and layout of analytic analysis. Experiments show that the segmentation algorithm following web standards at segmentation accuracy and complexity of adaptive aspects of the page has been improved.
出处
《微处理机》
2009年第6期58-61,共4页
Microprocessors
基金
国家自然科学基金青年基金资助(编号60403009)
关键词
页面分块
层叠样式表
语义块
Web Page Segmentation
Cascading Style Sheets
Semantic Block