商品价格数据的两种WEB挖掘算法比较被引量：3

Compare Two Web Mining Algorithm for Commodity Price

下载PDF

导出

摘要其他网络商店的商品实时价格是Web商店店主所关注的重要数据,Web数据挖掘使得这一需求变为现实.通过正则表达式算法与分词算法的比较研究,给出了基于正则表达式的商品价格抽取算法和基于分词的网站目录树抽取算法、HTML网页商品抽取算法与商品价格抽取算法.应用系统的实践表明,正则表达式算法的挖全率与正确率较低,而分词算法的挖全率与正确率都达到99%以上,完全满足应用需求,同时可以为商品的市场预测与分析提供依据. Commodities price of others e-supermarkets is the most important data for the shopkeepers of shop online.This requirement becomes actuality because of the Web mining developing very fast.The algorithm based on regular expression and the extract algorithm for directory tree of Website,commodities name on the Webpage and commodities price based on participle are described in detailed respectively.All of them depend on the researched of the regular expression and the participle algorithm.The implementation shows that the lower average full rate and accuracy rate is got from regular expression algorithm.However,the participle algorithm can get more than ninety nine percent of average full rate and accuracy rate.The results show as by this way can touch the shopkeepers minds,and it can support the originality data for the commodities markets and forecast analysis.

作者王红艳朱全银严云洋钱进

机构地区淮阴工学院计算机工程学院

出处《微电子学与计算机》 CSCD 北大核心 2011年第10期168-172,共5页 Microelectronics & Computer

基金江苏省创新基金(BC2009 208) 淮安市产学研合作计划(HAC201002)

关键词商品价格数据挖掘正则表达式分词算法比较 commodity price data mining regular expression participle algorithm compare

分类号 TP391 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献10

1Chen Qi, Hou Ming. XML-based data mining design and implementation [C]//International Conference on Computer Design and Applications. Qinhuangdao, China:IEEE, 2010: 610-613.
2Antony S, Wu Ping, Agrawal D. et al. Aggregate skyline: analysis for online users [C]//Ninth Annual International Symposium on Applications and the Internet. Bellevue, Washington, USA:IEEE, 2009: 50-56.
3Alia H, Al-Ghreimil N. A novel efficient classification algorithm for search engines [C]//Computational Intelligence for Modelling Control & Automation. Vienna, Austria. IEEE, 2008 : 773 - 778.
4Atanasova T, Kasheva M, Sulova S, et al. Analysis of the possible application of Data Mining, Text Mining and Web Mining in Business Intelligent Systems [C]// Proceedings of the 33rd International Convention. Opatija, Croatia:IEEE, 2010. 1294 - 1297.
5何波,涂飞,程勇军.Web日志挖掘数据预处理研究[J].微电子学与计算机,2011,28(4):111-114. 被引量：6
6Salin S, Senkul P. Using semantic information for web usage mining based recommendation [C]//International Symposium on Computer and Information Sciences. Northern Cyprus:IEEE, 2009 : 236 - 241.
7张瑞雪,宋明秋,公衍磊.逆序解析DOM树及网页正文信息提取[J].计算机科学,2011,38(4):213-215. 被引量：15
8Xu Cheng Zhong, Ibrahim T I. A keyword-based semantic prefetching approach in Internet news services [J]. Knowledge and Data Engineering, 2004, 16 (5) 601 - 611.
9Grobelnik M, Mladenic D, Fortuna B. Semantic technology for capturing communication inside an organization [J]. Internet Computing, 2009, 13(4): 59-67.
10Litecky C, Aken A, Ahrnad A, et al. Mining for computing jobs [J]. Software, 2010, 27(1): 78-85.

二级参考文献17

1何昕,谢志鹏.基于简单树匹配算法的Web页面结构相似性度量[J].计算机研究与发展,2007,44(z3):1-6. 被引量：15
2常育红,姜哲,朱小燕.基于标记树表示方法的页面结构分析[J].计算机工程与应用,2004,40(16):129-132. 被引量：24
3陈琼,苏文健.基于网页结构树的Web信息抽取方法[J].计算机工程,2005,31(20):54-55. 被引量：24
4朱明,王庆伟.半结构化网页中多记录信息的自动抽取方法[J].计算机仿真,2005,22(12):95-97. 被引量：2
5王志琪,王永成.HTML文件的文本信息预处理技术[J].计算机工程,2006,32(5):46-48. 被引量：12
6潘有能.XML文档自动聚类研究[J].情报学报,2006,25(2):215-220. 被引量：16
7Lin Shian-hua, Ho Jan-ruing. Discovering informative content blocks from web documents[C].// Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining. 2002:588-593.
8Gupta S, Kaiser G, Neistadt D, et al. DOM based content extraction of HTML documents[C].//Proceedings of the 12th international conference on World Wide Web. 2003:207-214.
9俞勇.Web数据挖掘[M].北京:清华大学出版社,2009.
10余肖生.基于XGMML0一LOGMI 的Web日志挖掘.重庆理工大学学报自然科学版,:61-65.

共引文献19

1朱学芳,冯曦曦.基于文本内容的农业网页信息抽取和分类研究[J].情报科学,2012,30(7):1012-1015. 被引量：3
2夏青.数据挖掘中数据预处理的分析[J].科技风,2012(22):55-55.
3刘成,张凯,陈建勋.混合方式数据验证方案的研究[J].计算机工程与设计,2013,34(1):366-371. 被引量：1
4彭伟.Web气象信息树型提取算法与LED显示设计[J].实验室研究与探索,2013,32(1):203-208. 被引量：1
5倪晨,邱鹏,曹慧.基于B/S结构的中医药信息采集系统[J].山东科学,2013,26(4):56-59. 被引量：2
6邓俊,吾守尔.斯拉木,艾尼宛尔.托乎提,袁廷磊,赵志成.维吾尔文网页研究及Android维文浏览器的实现[J].中文信息学报,2014,28(1):118-124.
7吴茜,刘嘉勇,卿粼波.基于VIPS算法和模糊字典匹配的网页提取技术研究[J].信息网络安全,2014(10):49-53. 被引量：4
8王吉林,舒江波,李勇,杨森.分布式Web主题信息抽取的框架探析[J].情报理论与实践,2014,37(12):117-122. 被引量：2
9刘峰,徐川,金晋,牛毅.一种基于动态SHIM的视频拖拽点播方案[J].现代电子技术,2014,37(24):9-11.
10张玺,张学玲,张洪欣.基于Web日志的数据预处理方法研究[J].滨州学院学报,2014,30(6):98-104. 被引量：4

同被引文献46

1王琦,唐世渭,杨冬青,王腾蛟.基于DOM的网页主题信息自动提取[J].计算机研究与发展,2004,41(10):1786-1792. 被引量：81
2LEE S H, LIME J S. Forecasting exchange rate by weighted average defuzzification based on NEWFM [C]// 6^th IEEE International Conference on Industrial Informatics. Daejeon: Institute of Electrical and Electronics Engineers Inc. , 2008:1036-1041.
3WANG Hua, LIU Bingxiang, CHENG Xiang, et al. An exchange rate forecasting method based on probabilistic neural network [ C ]// International Conference on Electronic and Mechanical Engineering and Information Technology. Harbin: IEEE Computer Society, 2011 : 3124-3126.
4YANG Hengli, LIN Hanchou. Applying EMD-based neural network to forecast NTD/USD exchange rate [C]// 7^th International Conference on Networked Computing and Advanced Information Management. Daejeon: IEEE Computer Society, 2011:352-357.
5WU Hong, CHEN Fuzhong. Chinese exchange rate forecasting based on the application of grey system DGM( 2, 1 ) model in post-crisis era [C]// 3^th International Conference on Information Management, Innovation Management and Industrial Engineering. Kunming: IEEE Computer Society, 2010 : 592-595.
6HADAVANDI E, GHANBARI A, ABBASIAN N S. Developing a time series model based on particle swarm optimization for gold price forecasting [C]//3^th International Conference on Business Intelligence and Financial Engineering. Hong Kong: IEEE Computer Society, 2010:337-340.
7LIU Fanyong. The hybrid prediction model of CNY/USD exchange rate based on wavelet and support vector re- gression [C]//2^nd International Conference on Advanced Computer Control. Shenyang : IEEE Computer Society, 2010:561-565.
8ZHU Quanyin, YAN Yunyang, DING Jin, et al. The commodities price extracting for shop online [C]// International Conference on Future Information Technology and Management Engineering. Changzhou: IEEE Computer Society, 2010 ( 2 ) : 317-320.
9ZHU Quanyin, YAN Yunyang, DING Jin, et al. The case study for price extracting of mobile phone sell online [C]// IEEE 2nd International Conference on Soft- ware Engineering and Service Science. Beijing: IEEE Computer Society, 2011:281-295.
10ZHU Quanyin, CAO Sunqun, DING Jin, et al. Research on the price forecast without complete data based on Web mining [C]// 10^th International Symposium on Distributed Computing and Applications to Business, Engineering and Science. Wuxi: IEEE Computer Society, 2011 : 120-123.

引证文献3

1朱全银,严云洋,周培,谷天峰.一种线性插补与自适应滑动窗口价格预测模型[J].山东大学学报（工学版）,2012,42(5):53-58. 被引量：3
2朱全银,周培,尹永华,陈浮,刘金岭.基于Web数据挖掘的多因素科技专家信息提取方法[J].淮阴工学院学报,2013,22(5):23-27. 被引量：1
3朱全银,潘禄,刘文儒,李翔,张永军,刘金岭.Web科技新闻分类抽取算法[J].淮阴工学院学报,2015,24(5):18-24. 被引量：3

二级引证文献7

1李冬梅,李翔.基于NWBO的数字化图书馆投入预测建模研究[J].南京工程学院学报（自然科学版）,2012,10(4):48-51. 被引量：1
2胡蓉静,朱全银.河道堤防占用补偿费征收趋势分析[J].中国水利,2013(14):50-52.
3严云洋,瞿学新,朱全银,李翔,赵阳.基于离群点检测的分类结果置信度的度量方法[J].南京大学学报（自然科学版）,2019,55(1):102-109. 被引量：4
4范晓.我国价格预测方法文献研究[J].开发研究,2014(5):105-109. 被引量：11
5朱全银,潘禄,刘文儒,李翔,张永军,刘金岭.Web科技新闻分类抽取算法[J].淮阴工学院学报,2015,24(5):18-24. 被引量：3
6瞿学新,朱全银,严云洋,李翔.基于互信息和关联规则的文本特征提取方法[J].淮阴工学院学报,2018,27(3):20-24. 被引量：1
7张柯文,李翔,朱全银,方强强,马甲林,成洁怡,丁行硕.一种基于WSD层级记忆网络建模的文档表示方法[J].淮阴工学院学报,2020,29(3):47-53.

1曹月雷,纪文彦,贾斌.词典与后缀数组相结合的中文分词方法[J].硅谷,2012,5(21):151-154. 被引量：2
2章立,陈蜀宇.一种针对商品价格进行实时垂直搜索的方法[J].重庆工学院学报（自然科学版）,2008,22(10):140-143.
3李向伟,李战明,张明新,邢敬宏,魏伟一.基于内容的视频镜头检测技术[J].电视技术,2008,32(3):19-21. 被引量：5
4周宓.基于支持向量机的信用卡信誉检测[J].新乡学院学报,2012,29(6):514-516.
5宗中.中文信息检索中词典机制分词算法的研究[J].计算机技术与发展,2014,24(4):118-121. 被引量：6
6孙建旺,吕学强,郭跇秀.基于微博转发集的微博过滤研究[J].北京信息科技大学学报（自然科学版）,2013,28(3):27-33. 被引量：3
7周宓.基于决策树方法的信用卡信誉检测[J].中原工学院学报,2011,22(4):75-78.
8许黎,黄果.基于语义分析的不良信息过滤系统研究[J].乐山师范学院学报,2014,29(5):33-38.
9刘件,魏程.中文分词算法研究[J].微计算机应用,2008,29(8):11-16. 被引量：25
10赵鹏,王文彬,朱伟伟.融合主题和视觉语义的图像自动标注方法[J].计算机辅助设计与图形学学报,2013,25(11):1709-1714. 被引量：7

微电子学与计算机

2011年第10期

浏览历史

内容加载中请稍等...

商品价格数据的两种WEB挖掘算法比较被引量：3

参考文献10

二级参考文献17

共引文献19

同被引文献46

引证文献3

二级引证文献7

相关作者

相关机构

相关主题

浏览历史

商品价格数据的两种WEB挖掘算法比较 被引量：3

参考文献10

二级参考文献17

共引文献19

同被引文献46

引证文献3

二级引证文献7

相关作者

相关机构

相关主题

浏览历史

商品价格数据的两种WEB挖掘算法比较被引量：3