摘要
文中系统基于网络爬虫技术实现了文献资源的智能搜索和关键信息的抓取功能,把采集到的信息采用本体论的方法进行分类识别,并自动存储文献资源到本地服务器。下载子系统采用负载均衡的方法把下载任务分配到多个服务器。系统采用高效的Protobuf socket通信手段,提供高效准确的内部下载服务。通过对内提供统一门户入口的方式对检索和下载行为进行记录,有效避免了同一资源的重复下载,也使得文献检索和下载行为变得可追溯,为图书文献情报管理和研究工作提供了数据支撑。该系统可有效减少科研机构获取学术资源所需的资金投入并减少网络带宽占用。
This system has realized intelligent search and external academic resources capture based on network crawler technique. It uses ontology technology to identify each article and automatically store the resources into local repository. Downloading subsystem in this sys-tem applies load balance method to distribute downloading tasks equally to each download server. Protobuf,a high-efficiency communica-tion mechanism,provides downloading service with high availability and accuracy in this system. At the same time,this system has solved the problem of repeated downloading and access recording by offering a unique entrance to the whole institute. Access control is also de-signed to eliminate malicious and excessive downloading. System automatically saves user searching data,which makes information re-trieval becomes traceable,providing data support for library information management and research. This system can effectively reduce ex-pense on digital academic resources for institute and network bandwidth.
出处
《计算机技术与发展》
2014年第11期35-38,共4页
Computer Technology and Development
基金
中国科学院重点项目(院1221)