摘要
为了有效满足企业内分布式文件存储的检索需求,本文系统通过FTP、SMB和SSH协议建立到FTP服务器、Windows共享文件夹的连接,利用分布式任务调度中心定期或手动执行文件爬取任务,同时对文件的标题、内容等进行分析,建立整合的文件索引库。其中,存储文件的MD5摘要作为文件版本变动的判别依据。在系统前端为用户提供了简洁强大的文件检索入口,用户输入关键字即可获得分布在各处的文件并可以在线浏览或下载。该系统能够显著提高企业用户检索文档的效率和获得信息的能力。
In order to effectively meet the retrieval needs of distributed file storage in the enterprise, the system establishes connections to FTP servers and Windows shared folders through FTP, SMB and SSH protocols, and uses the distributed task scheduling center to perform file crawling tasks regularly or manually. It analyzes the title, content, etc. of the file, and establish an integrated file index library, in which the MD5 abstract of the file is stored as the basis for judging the file version change. The system provides users with a simple and powerful file retrieval entry at the front end. Users can input keywords to obtain files distributed everywhere and browse or download them online. The system can significantly improve the efficiency of document retrieval and the ability to obtain information for enterprise users.
作者
周杨
熊浩
岳帅
赵杰
朱文韬
ZHOU Yang;XIONG Hao;YUE Shuai;ZHAO Jie;ZHU Wentao(State Grid Jiangsu Electric Power Co.,Ltd.,Zhenjiang Power Supply Branch,Zhenjiang Jiangsu 212211,China)
出处
《信息与电脑》
2021年第18期149-153,共5页
Information & Computer
基金
地区调度业务联系及处置辅助机器人研究(项目编号:J2021094)。
关键词
企业内网
文件爬取
文件检索
分布式任务调度
MD5摘要
enterprise intranet
file crawling
file retrieval
distributed task scheduling
MD5 summary