期刊文献+
共找到3篇文章
< 1 >
每页显示 20 50 100
Intelligent and Adaptive Web Data Extraction System Using Convolutional and Long Short-Term Memory Deep Learning Networks 被引量:4
1
作者 Sudhir Kumar Patnaik C.Narendra Babu Mukul Bhave 《Big Data Mining and Analytics》 EI 2021年第4期279-297,共19页
Data are crucial to the growth of e-commerce in today's world of highly demanding hyper-personalized consumer experiences,which are collected using advanced web scraping technologies.However,core data extraction e... Data are crucial to the growth of e-commerce in today's world of highly demanding hyper-personalized consumer experiences,which are collected using advanced web scraping technologies.However,core data extraction engines fail because they cannot adapt to the dynamic changes in website content.This study investigates an intelligent and adaptive web data extraction system with convolutional and Long Short-Term Memory(LSTM)networks to enable automated web page detection using the You only look once(Yolo)algorithm and Tesseract LSTM to extract product details,which are detected as images from web pages.This state-of-the-art system does not need a core data extraction engine,and thus can adapt to dynamic changes in website layout.Experiments conducted on real-world retail cases demonstrate an image detection(precision)and character extraction accuracy(precision)of 97%and 99%,respectively.In addition,a mean average precision of 74%,with an input dataset of 45 objects or images,is obtained. 展开更多
关键词 adaptive web scraping deep learning Long Short-Term Memory(LSTM) web data extraction You only look once(Yolo)
原文传递
An Efficient Mechanism for Product Data Extraction from E-Commerce Websites
2
作者 Malik Javed Akhtar Zahur Ahmad +3 位作者 Rashid Amin Sultan H.Almotiri Mohammed A.Al Ghamdi Hamza Aldabbas 《Computers, Materials & Continua》 SCIE EI 2020年第12期2639-2663,共25页
A large amount of data is present on the web which can be used for useful purposes like a product recommendation,price comparison and demand forecasting for a particular product.Websites are designed for human underst... A large amount of data is present on the web which can be used for useful purposes like a product recommendation,price comparison and demand forecasting for a particular product.Websites are designed for human understanding and not for machines.Therefore,to make data machine-readable,it requires techniques to grab data from web pages.Researchers have addressed the problem using two approaches,i.e.,knowledge engineering and machine learning.State of the art knowledge engineering approaches use the structure of documents,visual cues,clustering of attributes of data records and text processing techniques to identify data records on a web page.Machine learning approaches use annotated pages to learn rules.These rules are used to extract data from unseen web pages.The structure of web documents is continuously evolving.Therefore,new techniques are needed to handle the emerging requirements of web data extraction.In this paper,we have presented a novel,simple and efficient technique to extract data from web pages using visual styles and structure of documents.The proposed technique detects Rich Data Region(RDR)using query and correlative words of the query.RDR is then divided into data records using style similarity.Noisy elements are removed using a Common Tag Sequence(CTS)and formatting entropy.The system is implemented using JAVA and runs on the dataset of real-world working websites.The effectiveness of results is evaluated using precision,recall,and F-measure and compared with five existing systems.A comparison of the proposed technique to existing systems has shown encouraging results. 展开更多
关键词 Document object model rich data region common tag sequence web data extraction deep web mining
下载PDF
A Framework of Web Data Integrated LBS Middleware
3
作者 MENG Xiaofeng YIN Shaoyi XIAO Zhen 《Wuhan University Journal of Natural Sciences》 CAS 2006年第5期1187-1191,共5页
In this paper, we propose a flexible locationbased service (LBS) middleware framework to make the development and deployment of new location based applications much easier. Considering the World Wide Web as a huge d... In this paper, we propose a flexible locationbased service (LBS) middleware framework to make the development and deployment of new location based applications much easier. Considering the World Wide Web as a huge data source of location relative information, we integrate the common used web data extraction techniques into the middleware framework, exposing a unified web data interface for the upper applications to make them more attractive. Besides, the framework also emphasizes some common LBS issues, including positioning, location modeling, location-dependent query processing, privacy and secure management. 展开更多
关键词 location-based service (LBS) MIDDLEWARE web data extraction
下载PDF
上一页 1 下一页 到第
使用帮助 返回顶部