期刊文献+

面向专用信息获取的用户定制主题网络爬虫技术研究 被引量:18

Research on User Customized Topic Web Crawler for Specialized Information Acquiration Technology
下载PDF
导出
摘要 进入大数据时代,互联网已成为各行各业进行信息采集的重要阵地。面对爆炸式增长的网络信息资源,如何快速高效地筛选出所需的信息成为亟需解决的现实难题。在互联网海量数据和专用信息采集人员之间构建一个满足特定需求的信息筛选机制,可以大幅度提高专用信息获取工作效率。主题网络爬虫是所有互联网信息获取手段必须具备的首要环节,为了提高专用信息采集的准确性,文章进行了面向公开网络的用户定制主题网络爬虫技术研究。针对大数据时代信息筛选困难的问题,文章通过将用户的兴趣偏好融入到主题网络爬虫的抓取过程中,有效提高了信息筛选力度,并通过实验验证了文中方法能够提高查准率。 Stepping into the era of big data, the Internet has become an important battle field for every walk of life to collect intelligence. Facing the explosive growth of network information resources,how to screen out the required information quickly and efficiently is a practical problem to solve. It is very important to construct an information screening mechanism between the mass data and intelligence personnel to meet the needs of specific tasks, which can greatly improve the efficiency. In order to improve the accuracy of the information collected, this paper conducts the research on the user customized topic Web crawler technology for information acquisition. In order to solve the difficult problem of information screening in the large data age, the user's interest preference is integrated into the crawling process of the topic Web crawler, and the information screening is effectively improved. Experimental results show that the method can improve the precision.
出处 《信息网络安全》 CSCD 2017年第2期12-21,共10页 Netinfo Security
基金 国家自然科学基金[11202239]
关键词 大数据 主题网络爬虫 PAGERANK算法 行为分析 用户定制 big data topic Web crawler Pagerank algorithm behavior analysis user customized
  • 相关文献

参考文献21

二级参考文献236

共引文献162

同被引文献141

引证文献18

二级引证文献170

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部