摘要
进入大数据时代,互联网已成为各行各业进行信息采集的重要阵地。面对爆炸式增长的网络信息资源,如何快速高效地筛选出所需的信息成为亟需解决的现实难题。在互联网海量数据和专用信息采集人员之间构建一个满足特定需求的信息筛选机制,可以大幅度提高专用信息获取工作效率。主题网络爬虫是所有互联网信息获取手段必须具备的首要环节,为了提高专用信息采集的准确性,文章进行了面向公开网络的用户定制主题网络爬虫技术研究。针对大数据时代信息筛选困难的问题,文章通过将用户的兴趣偏好融入到主题网络爬虫的抓取过程中,有效提高了信息筛选力度,并通过实验验证了文中方法能够提高查准率。
Stepping into the era of big data, the Internet has become an important battle field for every walk of life to collect intelligence. Facing the explosive growth of network information resources,how to screen out the required information quickly and efficiently is a practical problem to solve. It is very important to construct an information screening mechanism between the mass data and intelligence personnel to meet the needs of specific tasks, which can greatly improve the efficiency. In order to improve the accuracy of the information collected, this paper conducts the research on the user customized topic Web crawler technology for information acquisition. In order to solve the difficult problem of information screening in the large data age, the user's interest preference is integrated into the crawling process of the topic Web crawler, and the information screening is effectively improved. Experimental results show that the method can improve the precision.
出处
《信息网络安全》
CSCD
2017年第2期12-21,共10页
Netinfo Security
基金
国家自然科学基金[11202239]