期刊文献+

一种针对商品数据记录的自动抽取方法 被引量:8

Automatic Extraction Method for Product Data Records
下载PDF
导出
摘要 提出一种针对电子商务网站商品列表页数据记录的自动抽取方法。该方法根据商品记录的特点,通过商品记录中商品的文本、图片以及布局等节点类型信息计算节点对应的值,依据节点值的相似度对节点进行分组,再从不同分组中过滤出包含数据记录节点的集合,从而抽取整个页面的数据记录。实验结果证明该方法有效且抽取效率较高。 This paper proposes an automatic extraction method for Product Data Record(PDR) of list page on E-commerce website.According to the characteristics of the product records,it calculates value for each node in the DOM tree of page by the node type information of text,image,layout and so on,classifies these nodes according to their similarity of value,and gets the final node collection which contains data record,so that the data records of the whole page are extracted.Experimental results show that the method is effective and with high efficiency.
出处 《计算机工程》 CAS CSCD 北大核心 2010年第23期262-265,共4页 Computer Engineering
基金 国家自然科学基金资助项目(60970015) 2008年江苏省重大科技支撑与自主创新基金资助项目(BE2008044) 江苏省基础研究计划企业博士创新基金资助项目(BK2009563)
关键词 WEB信息抽取 数据抽取 信息集成 商品数据记录 Web information extraction data extraction information integration Product Data Record(PDR)
  • 相关文献

参考文献7

  • 1Liu Bing. Mining Data Records in Web Pages[C]//Proceedings of the ACM International Conference on Knowledge Discovery and Data Mining. Washington D. C. , USA: [s. n. ], 2003:601-606.
  • 2Miao Gengxin, Tatemura J, Hsiung Wang+Pin, et al. Extracting Data Records from the Web Using Tag Path Clustering[C] //Proceedings of the 18th International Conference on the World Wide Web. Madrid: Spain, [s. n. ], 2009: 981-990.
  • 3胡仁龙,袁春风,武港山,濮小佳.基于重复模式的自动Web信息抽取[J].计算机工程,2008,34(22):73-76. 被引量:8
  • 4Zhai Yanhong, Liu Bing. Web Data Extraction Based on Partial Tree Alignment [C]//Proceedings of the 14th International Conference on the World Wide Web. Chiba, Japan.. [s. n. ], 2005 : 76-85.
  • 5Wang Jingyi, Lochovsk F H. Data Extraction and Label Assignment for Web Databases[C]//Proceedings of the 12th International Conference on the World Wide Web. Budapest, Hungary: [s. n. ],2003.. 187-196.
  • 6Liu Bing, Zhai Yanhong. NET: System for Extracting Web Data from Flat and Nested Data Records[C]//Proceedings of the Conference on Web Information Systems Engineering: New York, USA: [s. n.], 2005: 487-495.
  • 7Liu Wei, Meng Xiaofeng, Meng Weiyi. Vision-based Web Data Records Extractign[C]//Proceedings of the 9th Int'l Workshop on Web and Databases. New York, USA: ACM Press, 2006: 20 -25.

二级参考文献6

  • 1Chang Chia-Hui, Kayed M, Girgis M R. A Survey of Web Information Extraction Systems[J]. IEEE Transaction on Know-ledge and Data Engineering, 2006, 18( 10): 1411 - 1428.
  • 2Crescenzi V, Mecca G, Merialdo R Road-runner: Towards Automatic Data Extraction from Large Web Sites[C]//Proc. of the 26th Int'l Conf. on Very Large Database Systems. Roma, Italy: [s. n.], 2001: 109-118.
  • 3Chang Chia-Hui, Lui C. IEPAD: Information Extraction Based on Pattern Discovery[C]//Proceedings of the 10th International Conference on World Wide Web. Hong Kong, China: [s. n.], 2001: 681-688.
  • 4Liu Bing, Grossman R, Zhai Yanhong. Mining Data Records in Web Pages[C]//Proceedings of KDD'03. Washington D. C., USA: [s. n.], 2003: 601-606.
  • 5Phong L Vuong B Gao Xiaoying, et al. Data Extraction from Semi-structured Web Pages by Clustering[C]//Proceedings of WI'06. Hong Kong, China: [s. n.], 2006: 374-377.
  • 6Wu Yang. Identifying Syntactic Differences Between Two Programs[J]. Software-practice and Experience, 1991, 21(7): 739-755.

共引文献7

同被引文献68

引证文献8

二级引证文献27

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部