期刊文献+

C2C电子商务网站交易信息抽取工具的研究与实现

Research and Implementation of a Transaction Information Extraction Tool for C2C E-commerce Sites
下载PDF
导出
摘要 研究淘宝网和百度有啊这两个国内有代表性的C2C电子商务平台上的销售记录及其用户信息的抽取.针对两个网站上的店铺销售数据,设计一个基于JerichoHtmlParser的、以Html数据标签为地标的Web数据抽取算法;针对两个网站上的用户信息,设计一个基于正则表达式的Web数据抽取算法.设计实现了一个Web抽取系统,可以按不同的抽取规则实现对不同站点上数据的抽取.最后通过对上述2个平台上实际数据的抽取,验证了设计方案的有效性,实验证实了所设计的原型系统具有较高查全率和准确率. Taobao and Youa are representative C2C E-commerce platforms in China at present.This paper studies how to extract information from transaction record pages and user registration pages on these two platforms.According to the sales records and user registration information on the two sites,two Web data extraction algorithms are designed.One is JerichoHtmlParser-based and uses Html tag as landmark,the other is based on regular expression matching.A Web information extraction system which can extract data from different sites by different extraction rules is designed and implemented.To prove the validity of the algorithm,some experiments have been done.The results show that the prototype system has higher recall rate and accuracy rate.
出处 《泉州师范学院学报》 2010年第4期12-17,共6页 Journal of Quanzhou Normal University
关键词 WEB数据抽取 C2C电子商务 正则表达式 Web data extraction C2C E-commerce regular expression
  • 相关文献

参考文献5

  • 1ARASU Arvind,GARCIA-MOLINA Hector.Extracting structured data from Web pages[C].New York:Proc of the Int Conf on Management of Data,2003:3372348.
  • 2杨少华,林海略,韩燕波.Automatic data extraction from template-generated Web pages[J].Journal of Software,2008,19(2):209-223.
  • 3邓斌,邵培基,夏国恩.基于Choquet积分的HMM商品信息抽取方法[J].系统工程,2008,26(12):110-114. 被引量:6
  • 4于鲁波,陈超.互联网商品信息抽取技术[J].计算机工程,2008,34(5):274-276. 被引量:5
  • 5Liu Bing.Web数据挖掘[M].余勇,薛贵荣,韩定一译.北京:清华大学出版社,2009.

二级参考文献23

  • 1Doorenbos R B, Etzioni O. A scalable comparisonshopping agent for the world wide web (Technical report UW-CSE-96-01-03 ) [ Z ]. University of Washington, 1996, (18) : 283-294.
  • 2Seymore K, MeCallum A. Learning hidden Markov model structure for information extraction[A]. Proceedings of the AAAI' 99[C]. 1999 : 37-42.
  • 3Freitag D, McCallum A. Information extraction with HMM structures learned by stochastic optimization [A]. Proceedings of the eighteenth conference on artificial intelligence[C]. Edmonton.. AAAI Press, 2002:584-589.
  • 4Sugeno M. Fuzzy measures and fuzzy integrals: a survey[M]. New York : North Holland, 1977: 89-102.
  • 5Grabisch M, Sugeno M. Multi-attribute classification using fuzzy integral[A]. The First IEEE Conference on Fuzzy Systems[C]. San Diego,USA, 1992:47-54.
  • 6Keller J M,Osborn J. Training the fuzzy integral[J].International Journal of Approximate Reasoning, 1996,15(1),1-24.
  • 7Grabisch M. Fuzzy integrals as a generalized class of order fitters[A]. EurSymp Satellite Remote Sensing [C]. Rome,Italy, 1994:128-136.
  • 8Rabiner L R. A tutorial on hidden Markov models and selected applications in speech recognition[J]. Proceedings of the IEEE, 1989,77 (2) : 257-286.
  • 9Baum L E, et al. A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains[J]. Annals of Mathematical Statistics, 1970,41 (1) : 164-171.
  • 10Mohamed M A, Gader P. Generalized Hidden Markov Models - Part I:Theoretical Frameworks [J]. IEEE Transactions on Fuzzy Systems, 2000, 8:67-81.

共引文献10

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部