期刊文献+

基于网络爬虫的文献检索系统的研究和实现 被引量:7

Research and Realization of Academic Search System Based on Network Crawler
下载PDF
导出
摘要 文中系统基于网络爬虫技术实现了文献资源的智能搜索和关键信息的抓取功能,把采集到的信息采用本体论的方法进行分类识别,并自动存储文献资源到本地服务器。下载子系统采用负载均衡的方法把下载任务分配到多个服务器。系统采用高效的Protobuf socket通信手段,提供高效准确的内部下载服务。通过对内提供统一门户入口的方式对检索和下载行为进行记录,有效避免了同一资源的重复下载,也使得文献检索和下载行为变得可追溯,为图书文献情报管理和研究工作提供了数据支撑。该系统可有效减少科研机构获取学术资源所需的资金投入并减少网络带宽占用。 This system has realized intelligent search and external academic resources capture based on network crawler technique. It uses ontology technology to identify each article and automatically store the resources into local repository. Downloading subsystem in this sys-tem applies load balance method to distribute downloading tasks equally to each download server. Protobuf,a high-efficiency communica-tion mechanism,provides downloading service with high availability and accuracy in this system. At the same time,this system has solved the problem of repeated downloading and access recording by offering a unique entrance to the whole institute. Access control is also de-signed to eliminate malicious and excessive downloading. System automatically saves user searching data,which makes information re-trieval becomes traceable,providing data support for library information management and research. This system can effectively reduce ex-pense on digital academic resources for institute and network bandwidth.
出处 《计算机技术与发展》 2014年第11期35-38,共4页 Computer Technology and Development
基金 中国科学院重点项目(院1221)
关键词 网络爬虫 本体论 论文检索 Web MVC 负载均衡 network crawler ontology thesis retrieval Web MVC load balancing
  • 相关文献

参考文献15

二级参考文献50

  • 1田春虎.国内语义Web研究综述[J].情报学报,2005,24(2):243-249. 被引量:37
  • 2苗长芬 ,冯伟华 .面向主题Crawler的设计与实现[J].平原大学学报,2005,22(3):110-112. 被引量:1
  • 3吴佩贤.Linux环境下基于TCP的Socket编程浅析[J].现代电子技术,2005,28(16):53-55. 被引量:17
  • 4[13]SENSUS.http://www.isi.edu/natural-language/resources/sensus.html
  • 5[14]Mikrokmos.http://crl.nmsu.edu/Research/Projects/mikro/
  • 6[15]Guarino N.Semantic Matching:Formal Ontological Distinctions for Information Organization,Extraction,and Integration.In:Pazienza M T,eds.Information Extraction:A Multidisciplinary Approach to an Emerging Information Technology,Springer Verlag,1997,139~170
  • 7[16]Perez A G,Benjamins V R.Overview of Knowledge Sharing and Reuse Components:Ontologies and Problem-Solving Methods.Workshop on Ontologies and Problem-Solving Methods:Lessons Learned and Future Trends (IJCAI99),de Agosto,Estocolmo,1999
  • 8[17]Gruber T R.Towards Principles for the Design of Ontologies Used for Knowledge Sharing.International Journal of Human-Computer Studies,1995,43:907~928
  • 9[18]Guarino N,Welty C.A Formal Ontology of Properties.In:Dieg R,Corby O,eds.the Proceedings of the 12th International Conference on Knowledge Engineering and Knowledge Management (EKAW'2000),Springer Verlag,2000,97~112
  • 10[19]Guarino N,Masolo C,Vetere G.OntoSeek:Content-Based Access to the Web.IEEE Intelligent Systems,1999,14(3):70~80

共引文献947

同被引文献45

引证文献7

二级引证文献34

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部