Analysis and recognition of ancient scripts is a challenging task as these scripts are inscribed on pillars,stones,or leaves.Optical recognition systems can help in preserving,sharing,and accelerate the study of the a...Analysis and recognition of ancient scripts is a challenging task as these scripts are inscribed on pillars,stones,or leaves.Optical recognition systems can help in preserving,sharing,and accelerate the study of the ancient scripts,but lack of standard dataset for such scripts is a major constraint.Although many scholars and researchers have captured and uploaded inscription images on various websites,manual searching,downloading and extraction of these images is tedious and error prone.Web search queries return a vast number of irrelevant results,and manually extracting images for a specific script is not scalable.This paper proposes a novelmultistage system to identify the specific set of script images from a large set of images downloaded from web sources.The proposed system combines the two most important pattern matching techniques-Scale Invariant Feature Transform(SIFT)and Template matching,in a sequential pipeline,and by using the key strengths of each technique,the system can discard irrelevant images while retaining a specific type of images.展开更多
Data are crucial to the growth of e-commerce in today's world of highly demanding hyper-personalized consumer experiences,which are collected using advanced web scraping technologies.However,core data extraction e...Data are crucial to the growth of e-commerce in today's world of highly demanding hyper-personalized consumer experiences,which are collected using advanced web scraping technologies.However,core data extraction engines fail because they cannot adapt to the dynamic changes in website content.This study investigates an intelligent and adaptive web data extraction system with convolutional and Long Short-Term Memory(LSTM)networks to enable automated web page detection using the You only look once(Yolo)algorithm and Tesseract LSTM to extract product details,which are detected as images from web pages.This state-of-the-art system does not need a core data extraction engine,and thus can adapt to dynamic changes in website layout.Experiments conducted on real-world retail cases demonstrate an image detection(precision)and character extraction accuracy(precision)of 97%and 99%,respectively.In addition,a mean average precision of 74%,with an input dataset of 45 objects or images,is obtained.展开更多
文摘Analysis and recognition of ancient scripts is a challenging task as these scripts are inscribed on pillars,stones,or leaves.Optical recognition systems can help in preserving,sharing,and accelerate the study of the ancient scripts,but lack of standard dataset for such scripts is a major constraint.Although many scholars and researchers have captured and uploaded inscription images on various websites,manual searching,downloading and extraction of these images is tedious and error prone.Web search queries return a vast number of irrelevant results,and manually extracting images for a specific script is not scalable.This paper proposes a novelmultistage system to identify the specific set of script images from a large set of images downloaded from web sources.The proposed system combines the two most important pattern matching techniques-Scale Invariant Feature Transform(SIFT)and Template matching,in a sequential pipeline,and by using the key strengths of each technique,the system can discard irrelevant images while retaining a specific type of images.
文摘Data are crucial to the growth of e-commerce in today's world of highly demanding hyper-personalized consumer experiences,which are collected using advanced web scraping technologies.However,core data extraction engines fail because they cannot adapt to the dynamic changes in website content.This study investigates an intelligent and adaptive web data extraction system with convolutional and Long Short-Term Memory(LSTM)networks to enable automated web page detection using the You only look once(Yolo)algorithm and Tesseract LSTM to extract product details,which are detected as images from web pages.This state-of-the-art system does not need a core data extraction engine,and thus can adapt to dynamic changes in website layout.Experiments conducted on real-world retail cases demonstrate an image detection(precision)and character extraction accuracy(precision)of 97%and 99%,respectively.In addition,a mean average precision of 74%,with an input dataset of 45 objects or images,is obtained.