摘要
其他网络商店的商品实时价格是Web商店店主所关注的重要数据,Web数据挖掘使得这一需求变为现实.通过正则表达式算法与分词算法的比较研究,给出了基于正则表达式的商品价格抽取算法和基于分词的网站目录树抽取算法、HTML网页商品抽取算法与商品价格抽取算法.应用系统的实践表明,正则表达式算法的挖全率与正确率较低,而分词算法的挖全率与正确率都达到99%以上,完全满足应用需求,同时可以为商品的市场预测与分析提供依据.
Commodities price of others e-supermarkets is the most important data for the shopkeepers of shop online.This requirement becomes actuality because of the Web mining developing very fast.The algorithm based on regular expression and the extract algorithm for directory tree of Website,commodities name on the Webpage and commodities price based on participle are described in detailed respectively.All of them depend on the researched of the regular expression and the participle algorithm.The implementation shows that the lower average full rate and accuracy rate is got from regular expression algorithm.However,the participle algorithm can get more than ninety nine percent of average full rate and accuracy rate.The results show as by this way can touch the shopkeepers minds,and it can support the originality data for the commodities markets and forecast analysis.
出处
《微电子学与计算机》
CSCD
北大核心
2011年第10期168-172,共5页
Microelectronics & Computer
基金
江苏省创新基金(BC2009 208)
淮安市产学研合作计划(HAC201002)
关键词
商品价格
数据挖掘
正则表达式
分词
算法比较
commodity price
data mining
regular expression
participle
algorithm compare