期刊文献+

基于节点权重的网页去噪方法的研究 被引量:2

Research on Web Page Denoising Method Based on Node Weight
下载PDF
导出
摘要 随着网络信息的不断增多,网页信息不仅成为用户的重要信息来源,同时也是数据挖掘、信息检索等研究的重要数据来源。为提供高质量的文本信息源,页面去噪已经成为网页处理中不可忽视的步骤。随着网页制作技术的不断提升,页面中的视觉元素日益增多,网页节点信息愈加丰富。视觉信息已经成为页面去噪中不可忽视的重要部分。从用户的角度,在浏览网页时,视觉的信息网页能够第一时间反映页面中模块的重要程度。传统的页面去噪技术过多地忽略了页面的视觉特性,面对现今复杂的页面结构,去噪效果大大下降。文中在综合视觉信息和节点信息的基础上,提出了一种基于节点权重的去噪方法,该方法充分考虑了节点的视觉特性和内容特性。实验结果表明,该方法在网页去噪的准确率和召回率上有所提高。 As the network information is increasing continuously, website information is not only an important information resource of us- ers, but also important data source for data mining,information retrieval and other studies. To provide the text information with high quali- ty, website denoising has become a nonnegligible step for webpage processing. With the continuous improvement of webpage making technology, visual elements in webpage are raised increasingly, and the information of webpage node becomes richer and richer. Visual in- formation has been a nonnegligible and important part in webpage denoising. From a user' s point of view, the visual information can im- mediately reflect the importance of module in the page when browsing the web page. Traditional webpage denoising technology is neglec- ted in the visual characteristics of webpage too much. Facing to the current complex webpage, the denoising effects are decreased greatly. Based on the comprehensive visual information and node information, a noise weight-based denoising method is proposed which fully considers the visual and content characteristics of nodes. The experimental results indicate that its accuracy rate and recall rate is improved to certain content.
作者 王健 张金
出处 《计算机技术与发展》 2017年第10期83-86,共4页 Computer Technology and Development
基金 教育部专项研究项目(2013116)
关键词 视觉特性 节点权重 准确率 召回率 vision characteristics node weight accuracy rate recall rate
  • 相关文献

参考文献9

二级参考文献130

共引文献352

同被引文献24

引证文献2

二级引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部