With the rapid development of the Internet globally since the 21st century,the amount of data information has increased exponentially.Data helps improve people’s livelihood and working conditions,as well as learning ...With the rapid development of the Internet globally since the 21st century,the amount of data information has increased exponentially.Data helps improve people’s livelihood and working conditions,as well as learning efficiency.Therefore,data extraction,analysis,and processing have become a hot issue for people from all walks of life.Traditional recommendation algorithm still has some problems,such as inaccuracy,less diversity,and low performance.To solve these problems and improve the accuracy and variety of the recommendation algorithms,the research combines the convolutional neural networks(CNN)and the attention model to design a recommendation algorithm based on the neural network framework.Through the text convolutional network,the input layer in CNN has transformed into two channels:static ones and non-static ones.Meanwhile,the self-attention system focuses on the system so that data can be better processed and the accuracy of feature extraction becomes higher.The recommendation algorithm combines CNN and attention system and divides the embedding layer into user information feature embedding and data name feature extraction embedding.It obtains data name features through a convolution kernel.Finally,the top pooling layer obtains the length vector.The attention system layer obtains the characteristics of the data type.Experimental results show that the proposed recommendation algorithm that combines CNN and the attention system can perform better in data extraction than the traditional CNN algorithm and other recommendation algorithms that are popular at the present stage.The proposed algorithm shows excellent accuracy and robustness.展开更多
Data are crucial to the growth of e-commerce in today's world of highly demanding hyper-personalized consumer experiences,which are collected using advanced web scraping technologies.However,core data extraction e...Data are crucial to the growth of e-commerce in today's world of highly demanding hyper-personalized consumer experiences,which are collected using advanced web scraping technologies.However,core data extraction engines fail because they cannot adapt to the dynamic changes in website content.This study investigates an intelligent and adaptive web data extraction system with convolutional and Long Short-Term Memory(LSTM)networks to enable automated web page detection using the You only look once(Yolo)algorithm and Tesseract LSTM to extract product details,which are detected as images from web pages.This state-of-the-art system does not need a core data extraction engine,and thus can adapt to dynamic changes in website layout.Experiments conducted on real-world retail cases demonstrate an image detection(precision)and character extraction accuracy(precision)of 97%and 99%,respectively.In addition,a mean average precision of 74%,with an input dataset of 45 objects or images,is obtained.展开更多
A large amount of data is present on the web which can be used for useful purposes like a product recommendation,price comparison and demand forecasting for a particular product.Websites are designed for human underst...A large amount of data is present on the web which can be used for useful purposes like a product recommendation,price comparison and demand forecasting for a particular product.Websites are designed for human understanding and not for machines.Therefore,to make data machine-readable,it requires techniques to grab data from web pages.Researchers have addressed the problem using two approaches,i.e.,knowledge engineering and machine learning.State of the art knowledge engineering approaches use the structure of documents,visual cues,clustering of attributes of data records and text processing techniques to identify data records on a web page.Machine learning approaches use annotated pages to learn rules.These rules are used to extract data from unseen web pages.The structure of web documents is continuously evolving.Therefore,new techniques are needed to handle the emerging requirements of web data extraction.In this paper,we have presented a novel,simple and efficient technique to extract data from web pages using visual styles and structure of documents.The proposed technique detects Rich Data Region(RDR)using query and correlative words of the query.RDR is then divided into data records using style similarity.Noisy elements are removed using a Common Tag Sequence(CTS)and formatting entropy.The system is implemented using JAVA and runs on the dataset of real-world working websites.The effectiveness of results is evaluated using precision,recall,and F-measure and compared with five existing systems.A comparison of the proposed technique to existing systems has shown encouraging results.展开更多
The massive web-based information resources have led to an increasing demand for effective automatic retrieval of target information for web applications. This paper introduces a web-based data extraction tool that de...The massive web-based information resources have led to an increasing demand for effective automatic retrieval of target information for web applications. This paper introduces a web-based data extraction tool that deploys various algorithms to locate, extract and filter tabular data from HTML pages and to transform them into new web-based representations. The tool has been applied in an aquaculture web application platform for extracting and generating aquatic product market information. Results prove that this tool is very effective in extracting the required data from web pages.展开更多
A semi structured data extraction method to get the useful information embedded in a group of relevant web pages and store it with OEM(Object Exchange Model) is proposed. Then, the data mining method is adopted to dis...A semi structured data extraction method to get the useful information embedded in a group of relevant web pages and store it with OEM(Object Exchange Model) is proposed. Then, the data mining method is adopted to discover schema knowledge implicit in the semi structured data. This knowledge can make users understand the information structure on the web more deeply and thourouly. At the same time, it can also provide a kind of effective schema for the querying of web information.展开更多
Alteration is regarded as significant information for mineral exploration. In this study, ETM+ remote sensing data are used for recognizing and extracting alteration zones in northwestern Yunnan (云南), China. The ...Alteration is regarded as significant information for mineral exploration. In this study, ETM+ remote sensing data are used for recognizing and extracting alteration zones in northwestern Yunnan (云南), China. The principal component analysis (PCA) of ETM+ bands 1, 4, 5, and 7 was employed for OH alteration extractions. The PCA of ETM+ bands 1, 3, 4, and 5 was used for extracting Fe^2+ (Fe^3+) alterations. Interfering factors, such as vegetation, snow, and shadows, were masked. Alteration components were defined in the principal components (PCs) by the contributions of their diagnostic spectral bands. The zones of alteration identified from remote sensing were analyzed in detail along with geological surveys and field verification. The results show that the OH^- alteration is a main indicator of K-feldspar, phyllic, and prophilized alterations. These alterations are closely related to porphyry copper deposits. The Fe^2+ (Fe^3+) alteration indicates pyritization, which is mainly related to hydrothermal or skarn type polymetallic deposits.展开更多
There are quintillions of data on deoxyribonucleic acid(DNA)and protein in publicly accessible data banks,and that number is expanding at an exponential rate.Many scientific fields,such as bioinformatics and drug disc...There are quintillions of data on deoxyribonucleic acid(DNA)and protein in publicly accessible data banks,and that number is expanding at an exponential rate.Many scientific fields,such as bioinformatics and drug discovery,rely on such data;nevertheless,gathering and extracting data from these resources is a tough undertaking.This data should go through several processes,including mining,data processing,analysis,and classification.This study proposes software that extracts data from big data repositories automatically and with the particular ability to repeat data extraction phases as many times as needed without human intervention.This software simulates the extraction of data from web-based(point-and-click)resources or graphical user interfaces that cannot be accessed using command-line tools.The software was evaluated by creating a novel database of 34 parameters for 1360 physicochemical properties of antimicrobial peptides(AMP)sequences(46240 hits)from various MARVIN software panels,which can be later utilized to develop novel AMPs.Furthermore,for machine learning research,the program was validated by extracting 10,000 protein tertiary structures from the Protein Data Bank.As a result,data collection from the web will become faster and less expensive,with no need for manual data extraction.The software is critical as a first step to preparing large datasets for subsequent stages of analysis,such as those using machine and deep-learning applications.展开更多
This paper is concerned with the cooperative target stalking for a multi-unmanned surface vehicle(multi-USV)system.Based on the multi-agent deep deterministic policy gradient(MADDPG)algorithm,a multi-USV target stalki...This paper is concerned with the cooperative target stalking for a multi-unmanned surface vehicle(multi-USV)system.Based on the multi-agent deep deterministic policy gradient(MADDPG)algorithm,a multi-USV target stalking(MUTS)algorithm is proposed.Firstly,a V-type probabilistic data extraction method is proposed for the first time to overcome shortcomings of the MADDPG algorithm.The advantages of the proposed method are twofold:1)it can reduce the amount of data and shorten training time;2)it can filter out more important data in the experience buffer for training.Secondly,in order to avoid the collisions of USVs during the stalking process,an action constraint method called Safe DDPG is introduced.Finally,the MUTS algorithm and some existing algorithms are compared in cooperative target stalking scenarios.In order to demonstrate the effectiveness of the proposed MUTS algorithm in stalking tasks,mission operating scenarios and reward functions are well designed in this paper.The proposed MUTS algorithm can help the multi-USV system avoid internal collisions during the mission execution.Moreover,compared with some existing algorithms,the newly proposed one can provide a higher convergence speed and a narrower convergence domain.展开更多
Image steganography is a technique of concealing confidential information within an image without dramatically changing its outside look.Whereas vehicular ad hoc networks(VANETs),which enable vehicles to communicate w...Image steganography is a technique of concealing confidential information within an image without dramatically changing its outside look.Whereas vehicular ad hoc networks(VANETs),which enable vehicles to communicate with one another and with roadside infrastructure to enhance safety and traffic flow provide a range of value-added services,as they are an essential component of modern smart transportation systems.VANETs steganography has been suggested by many authors for secure,reliable message transfer between terminal/hope to terminal/hope and also to secure it from attack for privacy protection.This paper aims to determine whether using steganography is possible to improve data security and secrecy in VANET applications and to analyze effective steganography techniques for incorporating data into images while minimizing visual quality loss.According to simulations in literature and real-world studies,Image steganography proved to be an effectivemethod for secure communication on VANETs,even in difficult network conditions.In this research,we also explore a variety of steganography approaches for vehicular ad-hoc network transportation systems like vector embedding,statistics,spatial domain(SD),transform domain(TD),distortion,masking,and filtering.This study possibly shall help researchers to improve vehicle networks’ability to communicate securely and lay the door for innovative steganography methods.展开更多
Hidden Web provides great amount of domain-specific data for constructing knowledge services. Most previous knowledge extraction researches ignore the valuable data hidden in Web database, and related works do not ref...Hidden Web provides great amount of domain-specific data for constructing knowledge services. Most previous knowledge extraction researches ignore the valuable data hidden in Web database, and related works do not refer how to make extracted information available for knowledge system. This paper describes a novel approach to build a domain-specific knowledge service with the data retrieved from Hidden Web. Ontology serves to model the domain knowledge. Queries forms of different Web sites are translated into machine-understandable format, defined knowledge concepts, so that they can be accessed automatically. Also knowledge data are extracted from Web pages and organized in ontology format knowledge. The experiment proves the algorithm achieves high accuracy and the system facilitates constructing knowledge services greatly.展开更多
An improved self-organizing feature map (SOFM) neural network is presented to generate rectangular and hexagonal lattic with normal vector attached to each vertex. After the neural network was trained, the whole scatt...An improved self-organizing feature map (SOFM) neural network is presented to generate rectangular and hexagonal lattic with normal vector attached to each vertex. After the neural network was trained, the whole scattered data were divided into sub-regions where classified core were represented by the weight vectors of neurons at the output layer of neural network. The weight vectors of the neurons were used to approximate the dense 3-D scattered points, so the dense scattered points could be reduced to a reasonable scale, while the topological feature of the whole scattered points were remained.展开更多
To discover and identify the influential nodes in any complex network has been an important issue.It is a significant factor in order to control over the network.Through control on a network,any information can be spr...To discover and identify the influential nodes in any complex network has been an important issue.It is a significant factor in order to control over the network.Through control on a network,any information can be spread and stopped in a short span of time.Both targets can be achieved,since network of information can be extended and as well destroyed.So,information spread and community formation have become one of the most crucial issues in the world of SNA(Social Network Analysis).In this work,the complex network of twitter social network has been formalized and results are analyzed.For this purpose,different network metrics have been utilized.Visualization of the network is provided in its original form and then filter out(different percentages)from the network to eliminate the less impacting nodes and edges for better analysis.This network is analyzed according to different centrality measures,like edge-betweenness,betweenness centrality,closeness centrality and eigenvector centrality.Influential nodes are detected and their impact is observed on the network.The communities are analyzed in terms of network coverage considering theMinimum Spanning Tree,shortest path distribution and network diameter.It is found that these are the very effective ways to find influential and central nodes from such big social networks like Facebook,Instagram,Twitter,LinkedIn,etc.展开更多
As a real-time and authoritative source,the official Web pages of organizations contain a large amount of information.The diversity of Web content and format makes it essential for pre-processing to get the unified at...As a real-time and authoritative source,the official Web pages of organizations contain a large amount of information.The diversity of Web content and format makes it essential for pre-processing to get the unified attributed data,which has the value of organizational analysis and mining.The existing research on dealing with multiple Web scenarios and accuracy performance is insufficient.This paper aims to propose a method to transform organizational official Web pages into the data with attributes.After locating the active blocks in the Web pages,the structural and content features are proposed to classify information with the specific model.The extraction methods based on trigger lexicon and LSTM(Long Short-Term Memory)are proposed,which efficiently process the classified information and extract data that matches the attributes.Finally,an accurate and efficient method to classify and extract information from organizational official Web pages is formed.Experimental results show that our approach improves the performing indicators and exceeds the level of state of the art on real data set from organizational official Web pages.展开更多
A vision based query interface annotation meth od is used to relate attributes and form elements in form based web query interfaces, this method can reach accuracy of 82%. And a user participation method is used to tu...A vision based query interface annotation meth od is used to relate attributes and form elements in form based web query interfaces, this method can reach accuracy of 82%. And a user participation method is used to tune the result; user can answer "yes" or "no" for existing annotations, or manually annotate form elements. Mass feedback is added to the annotation algorithm to produce more accurate result. By this approach, query interface annotation can reach a perfect accuracy.展开更多
In this paper, we propose a flexible locationbased service (LBS) middleware framework to make the development and deployment of new location based applications much easier. Considering the World Wide Web as a huge d...In this paper, we propose a flexible locationbased service (LBS) middleware framework to make the development and deployment of new location based applications much easier. Considering the World Wide Web as a huge data source of location relative information, we integrate the common used web data extraction techniques into the middleware framework, exposing a unified web data interface for the upper applications to make them more attractive. Besides, the framework also emphasizes some common LBS issues, including positioning, location modeling, location-dependent query processing, privacy and secure management.展开更多
To extract structured data from a web page with customized requirements,a user labels some DOM elements on the page with attribute names.The common features of the labeled elements are utilized to guide the user throu...To extract structured data from a web page with customized requirements,a user labels some DOM elements on the page with attribute names.The common features of the labeled elements are utilized to guide the user through the labeling process to minimize user efforts,and are also utilized to retrieve attribute values.To turn the attribute values into a structured result,the attribute pattern needs to be induced.For this purpose,a space-optimized suffix tree called attribute tree is built to transform the document object model(DOM) tree into a simpler form while preserving its useful properties such as attribute sequence order.The pattern is induced bottom-up on the attribute tree,and is further used to build the structured result.Experiments are conducted and show high performance of our approach in terms of precision,recall and structural correctness.展开更多
This review summarizes the research outcomes and findings documented in 45 journal papers using a shared tunnel boring machine(TBM)dataset for performance prediction and boring efficiency optimization using machine le...This review summarizes the research outcomes and findings documented in 45 journal papers using a shared tunnel boring machine(TBM)dataset for performance prediction and boring efficiency optimization using machine learning methods.The big dataset was col-lected during the Yinsong water diversion project construction in China,covering the tunnel excavation of a 20 km-section with 199 items of monitoring metrics taken with an interval of one second.The research papers were the result of a call for contributions during a TBM machine learning contest in 2019 and covered a variety of topics related to the intelligent construction of TBM.This review com-prises two parts.Part I is concerned with the data processing,feature extraction,and machine learning methods applied by the contrib-utors.The review finds that the data-driven and knowledge-driven approaches in extracting important features applied by various authors are diversified,requiring further studies to achieve commonly accepted criteria.The techniques for cleaning and amending the raw data adopted by the contributors were summarized,indicating some highlights such as the importance of sufficiently high fre-quency of data acquisition(higher than 1 second),classification and standardization for the data preprocessing process,and the appro-priate selections of features in a boring cycle.The review finds that both supervised and unsupervised machine learning methods have been utilized by various researchers.The ensemble and deep learning methods have found wide applications.Part I highlights the impor-tant features of the individual methods applied by the contributors,including the structures of the algorithm,selection of hyperparam-eters,and model validation approaches.展开更多
Since the late 2010s,Artificial Intelligence(AI)including machine learning,boosted through deep learning,has boomed as a vital tool to leverage computer vision,natural language processing and speech recognition in rev...Since the late 2010s,Artificial Intelligence(AI)including machine learning,boosted through deep learning,has boomed as a vital tool to leverage computer vision,natural language processing and speech recognition in revolutionizing zoological research.This review provides an overview of the primary tasks,core models,datasets,and applications of AI in zoological research,including animal classification,resource conservation,behavior,development,genetics and evolution,breeding and health,disease models,and paleontology.Additionally,we explore the challenges and future directions of integrating AI into this field.Based on numerous case studies,this review outlines various avenues for incorporating AI into zoological research and underscores its potential to enhance our understanding of the intricate relationships that exist within the animal kingdom.As we build a bridge between beast and byte realms,this review serves as a resource for envisioning novel AI applications in zoological research that have not yet been explored.展开更多
Methods for extracting features from time series data using deep learning have been widely studied,but they still suffer from problems of severe loss of feature information across different network layers and paramete...Methods for extracting features from time series data using deep learning have been widely studied,but they still suffer from problems of severe loss of feature information across different network layers and parameter redun-dancy.Therefore,a new time-series data feature extraction model(CNN-CBAM)that integrates convolutional neural networks(CNN)and convolutional attention mechanisms(CBAM)is proposed.First,the parameters of the CNN and BiGRU prediction models are optimized through uniform design methods.Next,the CNN is used to extract features from the time series data,outputting multiple feature maps.These feature maps are then subjected to feature re-extraction by the CBAM attention mechanism at both the spatial and channel levels.Finally,the feature maps are input into the BiGRU model for prediction.Experimental results show that after CNN-CBAM processing,the stability and accuracy of the BiGRU pre-diction model improved by 77.6%and 76.3%,respectively,outperforming other feature extraction methods.Meanwhile,the training time of the model has only increased by 7.1%,demonstrating excellent time efficiency.展开更多
In this paper, a new method, named as L-tree match, is presented for extracting data from complex data sources. Firstly, based on data extraction logic presented in this work, a new data extraction model is constructe...In this paper, a new method, named as L-tree match, is presented for extracting data from complex data sources. Firstly, based on data extraction logic presented in this work, a new data extraction model is constructed in which model components are structurally correlated via a generalized template. Secondly, a database-populating mechanism is built, along with some object-manipulating operations needed for flexible database design, to support data extraction from huge text stream. Thirdly, top-down and bottom-up strategies are combined to design a new extraction algorithm that can extract data from data sources with optional, unordered, nested, and/or noisy components. Lastly, this method is applied to extract accurate data from biological documents amounting to 100GB for the first online integrated biological data warehouse of China.展开更多
文摘With the rapid development of the Internet globally since the 21st century,the amount of data information has increased exponentially.Data helps improve people’s livelihood and working conditions,as well as learning efficiency.Therefore,data extraction,analysis,and processing have become a hot issue for people from all walks of life.Traditional recommendation algorithm still has some problems,such as inaccuracy,less diversity,and low performance.To solve these problems and improve the accuracy and variety of the recommendation algorithms,the research combines the convolutional neural networks(CNN)and the attention model to design a recommendation algorithm based on the neural network framework.Through the text convolutional network,the input layer in CNN has transformed into two channels:static ones and non-static ones.Meanwhile,the self-attention system focuses on the system so that data can be better processed and the accuracy of feature extraction becomes higher.The recommendation algorithm combines CNN and attention system and divides the embedding layer into user information feature embedding and data name feature extraction embedding.It obtains data name features through a convolution kernel.Finally,the top pooling layer obtains the length vector.The attention system layer obtains the characteristics of the data type.Experimental results show that the proposed recommendation algorithm that combines CNN and the attention system can perform better in data extraction than the traditional CNN algorithm and other recommendation algorithms that are popular at the present stage.The proposed algorithm shows excellent accuracy and robustness.
文摘Data are crucial to the growth of e-commerce in today's world of highly demanding hyper-personalized consumer experiences,which are collected using advanced web scraping technologies.However,core data extraction engines fail because they cannot adapt to the dynamic changes in website content.This study investigates an intelligent and adaptive web data extraction system with convolutional and Long Short-Term Memory(LSTM)networks to enable automated web page detection using the You only look once(Yolo)algorithm and Tesseract LSTM to extract product details,which are detected as images from web pages.This state-of-the-art system does not need a core data extraction engine,and thus can adapt to dynamic changes in website layout.Experiments conducted on real-world retail cases demonstrate an image detection(precision)and character extraction accuracy(precision)of 97%and 99%,respectively.In addition,a mean average precision of 74%,with an input dataset of 45 objects or images,is obtained.
文摘A large amount of data is present on the web which can be used for useful purposes like a product recommendation,price comparison and demand forecasting for a particular product.Websites are designed for human understanding and not for machines.Therefore,to make data machine-readable,it requires techniques to grab data from web pages.Researchers have addressed the problem using two approaches,i.e.,knowledge engineering and machine learning.State of the art knowledge engineering approaches use the structure of documents,visual cues,clustering of attributes of data records and text processing techniques to identify data records on a web page.Machine learning approaches use annotated pages to learn rules.These rules are used to extract data from unseen web pages.The structure of web documents is continuously evolving.Therefore,new techniques are needed to handle the emerging requirements of web data extraction.In this paper,we have presented a novel,simple and efficient technique to extract data from web pages using visual styles and structure of documents.The proposed technique detects Rich Data Region(RDR)using query and correlative words of the query.RDR is then divided into data records using style similarity.Noisy elements are removed using a Common Tag Sequence(CTS)and formatting entropy.The system is implemented using JAVA and runs on the dataset of real-world working websites.The effectiveness of results is evaluated using precision,recall,and F-measure and compared with five existing systems.A comparison of the proposed technique to existing systems has shown encouraging results.
基金Supported by the Shanghai Education Committee (No.06KZ016)
文摘The massive web-based information resources have led to an increasing demand for effective automatic retrieval of target information for web applications. This paper introduces a web-based data extraction tool that deploys various algorithms to locate, extract and filter tabular data from HTML pages and to transform them into new web-based representations. The tool has been applied in an aquaculture web application platform for extracting and generating aquatic product market information. Results prove that this tool is very effective in extracting the required data from web pages.
文摘A semi structured data extraction method to get the useful information embedded in a group of relevant web pages and store it with OEM(Object Exchange Model) is proposed. Then, the data mining method is adopted to discover schema knowledge implicit in the semi structured data. This knowledge can make users understand the information structure on the web more deeply and thourouly. At the same time, it can also provide a kind of effective schema for the querying of web information.
基金supported by the project "Remote Sensing Alteration Abnormity Extraction from Geological Survey in Northwestern Yunnan, China" from China Geological Survey
文摘Alteration is regarded as significant information for mineral exploration. In this study, ETM+ remote sensing data are used for recognizing and extracting alteration zones in northwestern Yunnan (云南), China. The principal component analysis (PCA) of ETM+ bands 1, 4, 5, and 7 was employed for OH alteration extractions. The PCA of ETM+ bands 1, 3, 4, and 5 was used for extracting Fe^2+ (Fe^3+) alterations. Interfering factors, such as vegetation, snow, and shadows, were masked. Alteration components were defined in the principal components (PCs) by the contributions of their diagnostic spectral bands. The zones of alteration identified from remote sensing were analyzed in detail along with geological surveys and field verification. The results show that the OH^- alteration is a main indicator of K-feldspar, phyllic, and prophilized alterations. These alterations are closely related to porphyry copper deposits. The Fe^2+ (Fe^3+) alteration indicates pyritization, which is mainly related to hydrothermal or skarn type polymetallic deposits.
基金This work was funded by the Graduate Scientific Research School at Yarmouk University under Grant Number:82/2020。
文摘There are quintillions of data on deoxyribonucleic acid(DNA)and protein in publicly accessible data banks,and that number is expanding at an exponential rate.Many scientific fields,such as bioinformatics and drug discovery,rely on such data;nevertheless,gathering and extracting data from these resources is a tough undertaking.This data should go through several processes,including mining,data processing,analysis,and classification.This study proposes software that extracts data from big data repositories automatically and with the particular ability to repeat data extraction phases as many times as needed without human intervention.This software simulates the extraction of data from web-based(point-and-click)resources or graphical user interfaces that cannot be accessed using command-line tools.The software was evaluated by creating a novel database of 34 parameters for 1360 physicochemical properties of antimicrobial peptides(AMP)sequences(46240 hits)from various MARVIN software panels,which can be later utilized to develop novel AMPs.Furthermore,for machine learning research,the program was validated by extracting 10,000 protein tertiary structures from the Protein Data Bank.As a result,data collection from the web will become faster and less expensive,with no need for manual data extraction.The software is critical as a first step to preparing large datasets for subsequent stages of analysis,such as those using machine and deep-learning applications.
基金supported in part by the National Natural Science Foundation of China(61873335,61833011,62173164)the Project of Science and Technology Commission of Shanghai Municipality,China(20ZR1420200,21SQBS01600,22JC1401400,19510750300,21190780300)the Natural Science Foundation of Jiangsu Province of China(BK20201451)。
文摘This paper is concerned with the cooperative target stalking for a multi-unmanned surface vehicle(multi-USV)system.Based on the multi-agent deep deterministic policy gradient(MADDPG)algorithm,a multi-USV target stalking(MUTS)algorithm is proposed.Firstly,a V-type probabilistic data extraction method is proposed for the first time to overcome shortcomings of the MADDPG algorithm.The advantages of the proposed method are twofold:1)it can reduce the amount of data and shorten training time;2)it can filter out more important data in the experience buffer for training.Secondly,in order to avoid the collisions of USVs during the stalking process,an action constraint method called Safe DDPG is introduced.Finally,the MUTS algorithm and some existing algorithms are compared in cooperative target stalking scenarios.In order to demonstrate the effectiveness of the proposed MUTS algorithm in stalking tasks,mission operating scenarios and reward functions are well designed in this paper.The proposed MUTS algorithm can help the multi-USV system avoid internal collisions during the mission execution.Moreover,compared with some existing algorithms,the newly proposed one can provide a higher convergence speed and a narrower convergence domain.
基金Dr.Arshiya Sajid Ansari would like to thank the Deanship of Scientific Research at Majmaah University for supporting this work under Project No.R-2023-910.
文摘Image steganography is a technique of concealing confidential information within an image without dramatically changing its outside look.Whereas vehicular ad hoc networks(VANETs),which enable vehicles to communicate with one another and with roadside infrastructure to enhance safety and traffic flow provide a range of value-added services,as they are an essential component of modern smart transportation systems.VANETs steganography has been suggested by many authors for secure,reliable message transfer between terminal/hope to terminal/hope and also to secure it from attack for privacy protection.This paper aims to determine whether using steganography is possible to improve data security and secrecy in VANET applications and to analyze effective steganography techniques for incorporating data into images while minimizing visual quality loss.According to simulations in literature and real-world studies,Image steganography proved to be an effectivemethod for secure communication on VANETs,even in difficult network conditions.In this research,we also explore a variety of steganography approaches for vehicular ad-hoc network transportation systems like vector embedding,statistics,spatial domain(SD),transform domain(TD),distortion,masking,and filtering.This study possibly shall help researchers to improve vehicle networks’ability to communicate securely and lay the door for innovative steganography methods.
基金This project is supported by Major International Cooperation Program of NSFC Grant 60221120145 Chinese Folk Music Digital Library.
文摘Hidden Web provides great amount of domain-specific data for constructing knowledge services. Most previous knowledge extraction researches ignore the valuable data hidden in Web database, and related works do not refer how to make extracted information available for knowledge system. This paper describes a novel approach to build a domain-specific knowledge service with the data retrieved from Hidden Web. Ontology serves to model the domain knowledge. Queries forms of different Web sites are translated into machine-understandable format, defined knowledge concepts, so that they can be accessed automatically. Also knowledge data are extracted from Web pages and organized in ontology format knowledge. The experiment proves the algorithm achieves high accuracy and the system facilitates constructing knowledge services greatly.
基金Supported by Science Foundation of Zhejiang (No. 599008) ZUCC Science Research Foundation
文摘An improved self-organizing feature map (SOFM) neural network is presented to generate rectangular and hexagonal lattic with normal vector attached to each vertex. After the neural network was trained, the whole scattered data were divided into sub-regions where classified core were represented by the weight vectors of neurons at the output layer of neural network. The weight vectors of the neurons were used to approximate the dense 3-D scattered points, so the dense scattered points could be reduced to a reasonable scale, while the topological feature of the whole scattered points were remained.
文摘To discover and identify the influential nodes in any complex network has been an important issue.It is a significant factor in order to control over the network.Through control on a network,any information can be spread and stopped in a short span of time.Both targets can be achieved,since network of information can be extended and as well destroyed.So,information spread and community formation have become one of the most crucial issues in the world of SNA(Social Network Analysis).In this work,the complex network of twitter social network has been formalized and results are analyzed.For this purpose,different network metrics have been utilized.Visualization of the network is provided in its original form and then filter out(different percentages)from the network to eliminate the less impacting nodes and edges for better analysis.This network is analyzed according to different centrality measures,like edge-betweenness,betweenness centrality,closeness centrality and eigenvector centrality.Influential nodes are detected and their impact is observed on the network.The communities are analyzed in terms of network coverage considering theMinimum Spanning Tree,shortest path distribution and network diameter.It is found that these are the very effective ways to find influential and central nodes from such big social networks like Facebook,Instagram,Twitter,LinkedIn,etc.
基金This work was supported by the National Key Research and Development Program of China(Nos.2016QY03D0501,2017YFB0803300)the National Natural Science Foundation of China(Nos.61601146,61732022)Sichuan Science and Technology Program(No.2019YFSY0049).
文摘As a real-time and authoritative source,the official Web pages of organizations contain a large amount of information.The diversity of Web content and format makes it essential for pre-processing to get the unified attributed data,which has the value of organizational analysis and mining.The existing research on dealing with multiple Web scenarios and accuracy performance is insufficient.This paper aims to propose a method to transform organizational official Web pages into the data with attributes.After locating the active blocks in the Web pages,the structural and content features are proposed to classify information with the specific model.The extraction methods based on trigger lexicon and LSTM(Long Short-Term Memory)are proposed,which efficiently process the classified information and extract data that matches the attributes.Finally,an accurate and efficient method to classify and extract information from organizational official Web pages is formed.Experimental results show that our approach improves the performing indicators and exceeds the level of state of the art on real data set from organizational official Web pages.
基金Supported by the National Natural Science Foun-dation of China (60573091 ,60273018)
文摘A vision based query interface annotation meth od is used to relate attributes and form elements in form based web query interfaces, this method can reach accuracy of 82%. And a user participation method is used to tune the result; user can answer "yes" or "no" for existing annotations, or manually annotate form elements. Mass feedback is added to the annotation algorithm to produce more accurate result. By this approach, query interface annotation can reach a perfect accuracy.
基金Supported by the National Natural Science Foun-dation of China (60573091 ,60273018)National Basic Research andDevelopment Programof China(2003CB317000) +1 种基金the Key Project ofMinistry of Education of China (03044) Programfor NewCentu-ry Excellent Talents in University(NCET) .
文摘In this paper, we propose a flexible locationbased service (LBS) middleware framework to make the development and deployment of new location based applications much easier. Considering the World Wide Web as a huge data source of location relative information, we integrate the common used web data extraction techniques into the middleware framework, exposing a unified web data interface for the upper applications to make them more attractive. Besides, the framework also emphasizes some common LBS issues, including positioning, location modeling, location-dependent query processing, privacy and secure management.
基金Supported by the National High Technology Research and Development Programme of China(No.2009AA01 Z141)the National Natural Science Foundation of China(No.60573117)Beijing Natural Science Foundation(No.4131001)
文摘To extract structured data from a web page with customized requirements,a user labels some DOM elements on the page with attribute names.The common features of the labeled elements are utilized to guide the user through the labeling process to minimize user efforts,and are also utilized to retrieve attribute values.To turn the attribute values into a structured result,the attribute pattern needs to be induced.For this purpose,a space-optimized suffix tree called attribute tree is built to transform the document object model(DOM) tree into a simpler form while preserving its useful properties such as attribute sequence order.The pattern is induced bottom-up on the attribute tree,and is further used to build the structured result.Experiments are conducted and show high performance of our approach in terms of precision,recall and structural correctness.
基金supported by the National Key R&D Program of China(Grant No.2018YFB1702504)the National Natural Science Foundation of China(Grant Nos.52179121,51879284)+3 种基金the State Key Laboratory of Simulations and Regulation of Water Cycle in River Basin,China(Grant No.SKL2022ZD05)the IWHR Research&Development Support Program,China(Grant No.GE0145B012021)the Natural Science Foundation of Shaanxi Province,China(Grant No.2021JLM-50)the National Key R&D Program of China(Grant No.2022YFE0200400).
文摘This review summarizes the research outcomes and findings documented in 45 journal papers using a shared tunnel boring machine(TBM)dataset for performance prediction and boring efficiency optimization using machine learning methods.The big dataset was col-lected during the Yinsong water diversion project construction in China,covering the tunnel excavation of a 20 km-section with 199 items of monitoring metrics taken with an interval of one second.The research papers were the result of a call for contributions during a TBM machine learning contest in 2019 and covered a variety of topics related to the intelligent construction of TBM.This review com-prises two parts.Part I is concerned with the data processing,feature extraction,and machine learning methods applied by the contrib-utors.The review finds that the data-driven and knowledge-driven approaches in extracting important features applied by various authors are diversified,requiring further studies to achieve commonly accepted criteria.The techniques for cleaning and amending the raw data adopted by the contributors were summarized,indicating some highlights such as the importance of sufficiently high fre-quency of data acquisition(higher than 1 second),classification and standardization for the data preprocessing process,and the appro-priate selections of features in a boring cycle.The review finds that both supervised and unsupervised machine learning methods have been utilized by various researchers.The ensemble and deep learning methods have found wide applications.Part I highlights the impor-tant features of the individual methods applied by the contributors,including the structures of the algorithm,selection of hyperparam-eters,and model validation approaches.
基金supported by the National Natural Science Foundation of China (31871274)Natural Science Foundation of Chongqing,China (CSTB2022NSCQ-MSX0650)+2 种基金Science and Technology Research Program of Chongqing Municipal Education Commission (KJQN202100508)Team Project of Innovation Leading Talent in Chongqing (CQYC20210309536)“Contract System”Project of Chongqing Talent Plan (cstc2022ycjh-bgzxm0147)。
文摘Since the late 2010s,Artificial Intelligence(AI)including machine learning,boosted through deep learning,has boomed as a vital tool to leverage computer vision,natural language processing and speech recognition in revolutionizing zoological research.This review provides an overview of the primary tasks,core models,datasets,and applications of AI in zoological research,including animal classification,resource conservation,behavior,development,genetics and evolution,breeding and health,disease models,and paleontology.Additionally,we explore the challenges and future directions of integrating AI into this field.Based on numerous case studies,this review outlines various avenues for incorporating AI into zoological research and underscores its potential to enhance our understanding of the intricate relationships that exist within the animal kingdom.As we build a bridge between beast and byte realms,this review serves as a resource for envisioning novel AI applications in zoological research that have not yet been explored.
文摘Methods for extracting features from time series data using deep learning have been widely studied,but they still suffer from problems of severe loss of feature information across different network layers and parameter redun-dancy.Therefore,a new time-series data feature extraction model(CNN-CBAM)that integrates convolutional neural networks(CNN)and convolutional attention mechanisms(CBAM)is proposed.First,the parameters of the CNN and BiGRU prediction models are optimized through uniform design methods.Next,the CNN is used to extract features from the time series data,outputting multiple feature maps.These feature maps are then subjected to feature re-extraction by the CBAM attention mechanism at both the spatial and channel levels.Finally,the feature maps are input into the BiGRU model for prediction.Experimental results show that after CNN-CBAM processing,the stability and accuracy of the BiGRU pre-diction model improved by 77.6%and 76.3%,respectively,outperforming other feature extraction methods.Meanwhile,the training time of the model has only increased by 7.1%,demonstrating excellent time efficiency.
文摘In this paper, a new method, named as L-tree match, is presented for extracting data from complex data sources. Firstly, based on data extraction logic presented in this work, a new data extraction model is constructed in which model components are structurally correlated via a generalized template. Secondly, a database-populating mechanism is built, along with some object-manipulating operations needed for flexible database design, to support data extraction from huge text stream. Thirdly, top-down and bottom-up strategies are combined to design a new extraction algorithm that can extract data from data sources with optional, unordered, nested, and/or noisy components. Lastly, this method is applied to extract accurate data from biological documents amounting to 100GB for the first online integrated biological data warehouse of China.