摘要
当前农业垂直搜索引擎无法预测农产品价格趋势,难以满足农业生产者行情分析需要。文章设计农产品价格主题搜索引擎。首先网络爬虫从农业综合网站搜集网页,对网页进行转码、去重、提取内容等处理;使用主题相关度算法计算网页的主题相关度,用分类器对网页分类,将与主题相关的网页解析、存储;最后提取农产品价格及其影响因素信息。结果表明,系统可搜集农产品价格信息及影响农产品价格因素信息,为后续农产品价格预测提供数据支持。
Current agricultural vertical search engine can't assess the price trend of agricultural products, not suitable for agricultural producers to analysis the market quotations, in view of the present situation, paper researched and designed the agricultural pdces subject search engine, First, web crawler collected web pages from the agricultural comprehensive web site, and the web page for transcoding, de-duplication, extract content and so on; then, use the Topic similarity algorithm of this paper to judge the correlation degree of web pages, and used the classifier to classify the web pages, topic related web pages would be parsed, stored; finally, extracted the information of the price of agricultural products and the information of its influencing factors. Experimental results showed that the system could collect the information of agricultural products prices very well, and the system could also collect the information of the factors affecting the price of agricultural products, provided data support for the follow-up study of the price forecasting of agricultural products.
出处
《东北农业大学学报》
CAS
CSCD
北大核心
2016年第9期64-71,共8页
Journal of Northeast Agricultural University
基金
国家星火计划项目(2010GA670006)
关键词
网络爬虫
信息抓取
农产品价格
农业搜索引擎
web crawler
information crawl
agricultural prices prediction
agriculture search engine