摘要
为了收集大量、真实、可靠的中药材价格信息,为中药材价格的预测研究提供强有力的数据支撑。将数据来源方向瞄向互联网,在前期筛选的基础上,最终确立以中药材天地网为目标网站。在深入分析目标网站结构的基础上,基于Python语言设计了一款中药材价格信息爬虫。详细介绍了爬虫模拟请求网页、目标信息提取、信息存储及定时启动部署等环节的实现方法。实际测试结果显示,所设计的爬虫运行稳定,能高效、完整无误的抓取目标网站的信息条目。
In order to collect a large number of real and reliable price information of Chinese herbal medicines,and provide a strong data support for the price prediction of Chinese herbal medicines.With Internet data as the source,on the basis of pre-screening,the final establishment of Chinese herbal medicine Tiandi Web site as the goal.Based on the in-depth analysis of the target website structure,a price information crawler of Chinese medicinal materials was designed based on Python language.The implementation methods of web page simulation request,target information extraction,information storage and timing start deployment are introduced in detail.The actual test results show that the designed crawler runs steadily and can capture the information items of the target website efficiently and completely.
作者
张喜红
ZHANG Xi-hong(Department of Intelligent Engineering,Bozhou Vocational and Technical College,Bozhou 236800,China)
出处
《枣庄学院学报》
2019年第2期67-72,共6页
Journal of Zaozhuang University
基金
安徽省高校优秀青年人才支持计划项目(gxyq2018215)
安徽省高校自然科学研究重大项目(KJ2016SD41)
关键词
PYTHON
中药材
爬虫
Python
Chinese medicinal materials
spider