With the rapid development of the Internet globally since the 21st century,the amount of data information has increased exponentially.Data helps improve people’s livelihood and working conditions,as well as learning ...With the rapid development of the Internet globally since the 21st century,the amount of data information has increased exponentially.Data helps improve people’s livelihood and working conditions,as well as learning efficiency.Therefore,data extraction,analysis,and processing have become a hot issue for people from all walks of life.Traditional recommendation algorithm still has some problems,such as inaccuracy,less diversity,and low performance.To solve these problems and improve the accuracy and variety of the recommendation algorithms,the research combines the convolutional neural networks(CNN)and the attention model to design a recommendation algorithm based on the neural network framework.Through the text convolutional network,the input layer in CNN has transformed into two channels:static ones and non-static ones.Meanwhile,the self-attention system focuses on the system so that data can be better processed and the accuracy of feature extraction becomes higher.The recommendation algorithm combines CNN and attention system and divides the embedding layer into user information feature embedding and data name feature extraction embedding.It obtains data name features through a convolution kernel.Finally,the top pooling layer obtains the length vector.The attention system layer obtains the characteristics of the data type.Experimental results show that the proposed recommendation algorithm that combines CNN and the attention system can perform better in data extraction than the traditional CNN algorithm and other recommendation algorithms that are popular at the present stage.The proposed algorithm shows excellent accuracy and robustness.展开更多
There are quintillions of data on deoxyribonucleic acid(DNA)and protein in publicly accessible data banks,and that number is expanding at an exponential rate.Many scientific fields,such as bioinformatics and drug disc...There are quintillions of data on deoxyribonucleic acid(DNA)and protein in publicly accessible data banks,and that number is expanding at an exponential rate.Many scientific fields,such as bioinformatics and drug discovery,rely on such data;nevertheless,gathering and extracting data from these resources is a tough undertaking.This data should go through several processes,including mining,data processing,analysis,and classification.This study proposes software that extracts data from big data repositories automatically and with the particular ability to repeat data extraction phases as many times as needed without human intervention.This software simulates the extraction of data from web-based(point-and-click)resources or graphical user interfaces that cannot be accessed using command-line tools.The software was evaluated by creating a novel database of 34 parameters for 1360 physicochemical properties of antimicrobial peptides(AMP)sequences(46240 hits)from various MARVIN software panels,which can be later utilized to develop novel AMPs.Furthermore,for machine learning research,the program was validated by extracting 10,000 protein tertiary structures from the Protein Data Bank.As a result,data collection from the web will become faster and less expensive,with no need for manual data extraction.The software is critical as a first step to preparing large datasets for subsequent stages of analysis,such as those using machine and deep-learning applications.展开更多
Image steganography is a technique of concealing confidential information within an image without dramatically changing its outside look.Whereas vehicular ad hoc networks(VANETs),which enable vehicles to communicate w...Image steganography is a technique of concealing confidential information within an image without dramatically changing its outside look.Whereas vehicular ad hoc networks(VANETs),which enable vehicles to communicate with one another and with roadside infrastructure to enhance safety and traffic flow provide a range of value-added services,as they are an essential component of modern smart transportation systems.VANETs steganography has been suggested by many authors for secure,reliable message transfer between terminal/hope to terminal/hope and also to secure it from attack for privacy protection.This paper aims to determine whether using steganography is possible to improve data security and secrecy in VANET applications and to analyze effective steganography techniques for incorporating data into images while minimizing visual quality loss.According to simulations in literature and real-world studies,Image steganography proved to be an effectivemethod for secure communication on VANETs,even in difficult network conditions.In this research,we also explore a variety of steganography approaches for vehicular ad-hoc network transportation systems like vector embedding,statistics,spatial domain(SD),transform domain(TD),distortion,masking,and filtering.This study possibly shall help researchers to improve vehicle networks’ability to communicate securely and lay the door for innovative steganography methods.展开更多
An improved self-organizing feature map (SOFM) neural network is presented to generate rectangular and hexagonal lattic with normal vector attached to each vertex. After the neural network was trained, the whole scatt...An improved self-organizing feature map (SOFM) neural network is presented to generate rectangular and hexagonal lattic with normal vector attached to each vertex. After the neural network was trained, the whole scattered data were divided into sub-regions where classified core were represented by the weight vectors of neurons at the output layer of neural network. The weight vectors of the neurons were used to approximate the dense 3-D scattered points, so the dense scattered points could be reduced to a reasonable scale, while the topological feature of the whole scattered points were remained.展开更多
A vision based query interface annotation meth od is used to relate attributes and form elements in form based web query interfaces, this method can reach accuracy of 82%. And a user participation method is used to tu...A vision based query interface annotation meth od is used to relate attributes and form elements in form based web query interfaces, this method can reach accuracy of 82%. And a user participation method is used to tune the result; user can answer "yes" or "no" for existing annotations, or manually annotate form elements. Mass feedback is added to the annotation algorithm to produce more accurate result. By this approach, query interface annotation can reach a perfect accuracy.展开更多
In this paper, we propose a flexible locationbased service (LBS) middleware framework to make the development and deployment of new location based applications much easier. Considering the World Wide Web as a huge d...In this paper, we propose a flexible locationbased service (LBS) middleware framework to make the development and deployment of new location based applications much easier. Considering the World Wide Web as a huge data source of location relative information, we integrate the common used web data extraction techniques into the middleware framework, exposing a unified web data interface for the upper applications to make them more attractive. Besides, the framework also emphasizes some common LBS issues, including positioning, location modeling, location-dependent query processing, privacy and secure management.展开更多
A large amount of data is present on the web which can be used for useful purposes like a product recommendation,price comparison and demand forecasting for a particular product.Websites are designed for human underst...A large amount of data is present on the web which can be used for useful purposes like a product recommendation,price comparison and demand forecasting for a particular product.Websites are designed for human understanding and not for machines.Therefore,to make data machine-readable,it requires techniques to grab data from web pages.Researchers have addressed the problem using two approaches,i.e.,knowledge engineering and machine learning.State of the art knowledge engineering approaches use the structure of documents,visual cues,clustering of attributes of data records and text processing techniques to identify data records on a web page.Machine learning approaches use annotated pages to learn rules.These rules are used to extract data from unseen web pages.The structure of web documents is continuously evolving.Therefore,new techniques are needed to handle the emerging requirements of web data extraction.In this paper,we have presented a novel,simple and efficient technique to extract data from web pages using visual styles and structure of documents.The proposed technique detects Rich Data Region(RDR)using query and correlative words of the query.RDR is then divided into data records using style similarity.Noisy elements are removed using a Common Tag Sequence(CTS)and formatting entropy.The system is implemented using JAVA and runs on the dataset of real-world working websites.The effectiveness of results is evaluated using precision,recall,and F-measure and compared with five existing systems.A comparison of the proposed technique to existing systems has shown encouraging results.展开更多
To extract structured data from a web page with customized requirements,a user labels some DOM elements on the page with attribute names.The common features of the labeled elements are utilized to guide the user throu...To extract structured data from a web page with customized requirements,a user labels some DOM elements on the page with attribute names.The common features of the labeled elements are utilized to guide the user through the labeling process to minimize user efforts,and are also utilized to retrieve attribute values.To turn the attribute values into a structured result,the attribute pattern needs to be induced.For this purpose,a space-optimized suffix tree called attribute tree is built to transform the document object model(DOM) tree into a simpler form while preserving its useful properties such as attribute sequence order.The pattern is induced bottom-up on the attribute tree,and is further used to build the structured result.Experiments are conducted and show high performance of our approach in terms of precision,recall and structural correctness.展开更多
The massive web-based information resources have led to an increasing demand for effective automatic retrieval of target information for web applications. This paper introduces a web-based data extraction tool that de...The massive web-based information resources have led to an increasing demand for effective automatic retrieval of target information for web applications. This paper introduces a web-based data extraction tool that deploys various algorithms to locate, extract and filter tabular data from HTML pages and to transform them into new web-based representations. The tool has been applied in an aquaculture web application platform for extracting and generating aquatic product market information. Results prove that this tool is very effective in extracting the required data from web pages.展开更多
Alteration is regarded as significant information for mineral exploration. In this study, ETM+ remote sensing data are used for recognizing and extracting alteration zones in northwestern Yunnan (云南), China. The ...Alteration is regarded as significant information for mineral exploration. In this study, ETM+ remote sensing data are used for recognizing and extracting alteration zones in northwestern Yunnan (云南), China. The principal component analysis (PCA) of ETM+ bands 1, 4, 5, and 7 was employed for OH alteration extractions. The PCA of ETM+ bands 1, 3, 4, and 5 was used for extracting Fe^2+ (Fe^3+) alterations. Interfering factors, such as vegetation, snow, and shadows, were masked. Alteration components were defined in the principal components (PCs) by the contributions of their diagnostic spectral bands. The zones of alteration identified from remote sensing were analyzed in detail along with geological surveys and field verification. The results show that the OH^- alteration is a main indicator of K-feldspar, phyllic, and prophilized alterations. These alterations are closely related to porphyry copper deposits. The Fe^2+ (Fe^3+) alteration indicates pyritization, which is mainly related to hydrothermal or skarn type polymetallic deposits.展开更多
A semi structured data extraction method to get the useful information embedded in a group of relevant web pages and store it with OEM(Object Exchange Model) is proposed. Then, the data mining method is adopted to dis...A semi structured data extraction method to get the useful information embedded in a group of relevant web pages and store it with OEM(Object Exchange Model) is proposed. Then, the data mining method is adopted to discover schema knowledge implicit in the semi structured data. This knowledge can make users understand the information structure on the web more deeply and thourouly. At the same time, it can also provide a kind of effective schema for the querying of web information.展开更多
Since the late 2010s,Artificial Intelligence(AI)including machine learning,boosted through deep learning,has boomed as a vital tool to leverage computer vision,natural language processing and speech recognition in rev...Since the late 2010s,Artificial Intelligence(AI)including machine learning,boosted through deep learning,has boomed as a vital tool to leverage computer vision,natural language processing and speech recognition in revolutionizing zoological research.This review provides an overview of the primary tasks,core models,datasets,and applications of AI in zoological research,including animal classification,resource conservation,behavior,development,genetics and evolution,breeding and health,disease models,and paleontology.Additionally,we explore the challenges and future directions of integrating AI into this field.Based on numerous case studies,this review outlines various avenues for incorporating AI into zoological research and underscores its potential to enhance our understanding of the intricate relationships that exist within the animal kingdom.As we build a bridge between beast and byte realms,this review serves as a resource for envisioning novel AI applications in zoological research that have not yet been explored.展开更多
This paper is concerned with the cooperative target stalking for a multi-unmanned surface vehicle(multi-USV)system.Based on the multi-agent deep deterministic policy gradient(MADDPG)algorithm,a multi-USV target stalki...This paper is concerned with the cooperative target stalking for a multi-unmanned surface vehicle(multi-USV)system.Based on the multi-agent deep deterministic policy gradient(MADDPG)algorithm,a multi-USV target stalking(MUTS)algorithm is proposed.Firstly,a V-type probabilistic data extraction method is proposed for the first time to overcome shortcomings of the MADDPG algorithm.The advantages of the proposed method are twofold:1)it can reduce the amount of data and shorten training time;2)it can filter out more important data in the experience buffer for training.Secondly,in order to avoid the collisions of USVs during the stalking process,an action constraint method called Safe DDPG is introduced.Finally,the MUTS algorithm and some existing algorithms are compared in cooperative target stalking scenarios.In order to demonstrate the effectiveness of the proposed MUTS algorithm in stalking tasks,mission operating scenarios and reward functions are well designed in this paper.The proposed MUTS algorithm can help the multi-USV system avoid internal collisions during the mission execution.Moreover,compared with some existing algorithms,the newly proposed one can provide a higher convergence speed and a narrower convergence domain.展开更多
Natural hazard-triggered technological accidents(Natechs)refer to accidents involving releases of hazardous materials(hazmat)triggered by natural hazards.Huge economic losses,as well as human health and environmental ...Natural hazard-triggered technological accidents(Natechs)refer to accidents involving releases of hazardous materials(hazmat)triggered by natural hazards.Huge economic losses,as well as human health and environmental problems are caused by Natechs.In this regard,learning from previous Natechs is critical for risk management.However,due to data scarcity and high uncertainty concerning such hazards,it becomes a serious challenge for risk managers to detect Natechs from large databases,such as the National Response Center(NRC)database.As the largest database of hazmat release incidents,the NRC database receives hazmat release reports from citizens in the United States.However,callers often have incomplete details about the incidents they are reporting.This results in many records having incomplete information.Consequently,it is quite difficult to identify and extract Natechs accurately and efficiently.In this study,we introduce machine learning theory into the Natech retrieving research,and a Semi-Intelligent Natech Identification Framework(SINIF)is proposed in order to solve the problem.We tested the suitability of two supervised machine learning algorithms,namely the Long ShortTerm Memory(LSTM)and the Convolutional Neural Network(CNN),and selected the former for the development of the SINIF.According to the results,the SINIF is efficient(a total number of 826,078 records were analyzed)and accurate(the accuracy is over 0.90),while 32,841 Natech reports between 1990 and 2017 were extracted from the NRC database.Furthermore,the majority of those Natech reports(97.85%)were related to meteorological phenomena,with hurricanes(24.41%),heavy rains(19.27%),and storms(18.29%)as the main causes of these reported Natechs.Overall,this study suggests that risk managers can benefit immensely from SINIF in analyzing Natech data from large databases efficiently.展开更多
Due to continuous cutting tool usage,tool supervision is essential for improving the metal cutting industry.In the metal removal process tool,supervision is carried out either by an operator or online tool supervision...Due to continuous cutting tool usage,tool supervision is essential for improving the metal cutting industry.In the metal removal process tool,supervision is carried out either by an operator or online tool supervision.Tool super-vision helps to understand tool condition,dimensional accuracy,and surface superiority.For downtime in the metal cutting industry,the main reasons are tool breakage and excessive wear,so it is necessary to supervise tool which gives better tool life and enhance productivity.This paper presents different conventional and artificial intelligence techniques for tool supervision in the processing procedures that have been depicted in writing.展开更多
Hidden Web provides great amount of domain-specific data for constructing knowledge services. Most previous knowledge extraction researches ignore the valuable data hidden in Web database, and related works do not ref...Hidden Web provides great amount of domain-specific data for constructing knowledge services. Most previous knowledge extraction researches ignore the valuable data hidden in Web database, and related works do not refer how to make extracted information available for knowledge system. This paper describes a novel approach to build a domain-specific knowledge service with the data retrieved from Hidden Web. Ontology serves to model the domain knowledge. Queries forms of different Web sites are translated into machine-understandable format, defined knowledge concepts, so that they can be accessed automatically. Also knowledge data are extracted from Web pages and organized in ontology format knowledge. The experiment proves the algorithm achieves high accuracy and the system facilitates constructing knowledge services greatly.展开更多
This review summarizes the research outcomes and findings documented in 45 journal papers using a shared tunnel boring machine(TBM)dataset for performance prediction and boring efficiency optimization using machine le...This review summarizes the research outcomes and findings documented in 45 journal papers using a shared tunnel boring machine(TBM)dataset for performance prediction and boring efficiency optimization using machine learning methods.The big dataset was col-lected during the Yinsong water diversion project construction in China,covering the tunnel excavation of a 20 km-section with 199 items of monitoring metrics taken with an interval of one second.The research papers were the result of a call for contributions during a TBM machine learning contest in 2019 and covered a variety of topics related to the intelligent construction of TBM.This review com-prises two parts.Part I is concerned with the data processing,feature extraction,and machine learning methods applied by the contrib-utors.The review finds that the data-driven and knowledge-driven approaches in extracting important features applied by various authors are diversified,requiring further studies to achieve commonly accepted criteria.The techniques for cleaning and amending the raw data adopted by the contributors were summarized,indicating some highlights such as the importance of sufficiently high fre-quency of data acquisition(higher than 1 second),classification and standardization for the data preprocessing process,and the appro-priate selections of features in a boring cycle.The review finds that both supervised and unsupervised machine learning methods have been utilized by various researchers.The ensemble and deep learning methods have found wide applications.Part I highlights the impor-tant features of the individual methods applied by the contributors,including the structures of the algorithm,selection of hyperparam-eters,and model validation approaches.展开更多
As a real-time and authoritative source,the official Web pages of organizations contain a large amount of information.The diversity of Web content and format makes it essential for pre-processing to get the unified at...As a real-time and authoritative source,the official Web pages of organizations contain a large amount of information.The diversity of Web content and format makes it essential for pre-processing to get the unified attributed data,which has the value of organizational analysis and mining.The existing research on dealing with multiple Web scenarios and accuracy performance is insufficient.This paper aims to propose a method to transform organizational official Web pages into the data with attributes.After locating the active blocks in the Web pages,the structural and content features are proposed to classify information with the specific model.The extraction methods based on trigger lexicon and LSTM(Long Short-Term Memory)are proposed,which efficiently process the classified information and extract data that matches the attributes.Finally,an accurate and efficient method to classify and extract information from organizational official Web pages is formed.Experimental results show that our approach improves the performing indicators and exceeds the level of state of the art on real data set from organizational official Web pages.展开更多
To discover and identify the influential nodes in any complex network has been an important issue.It is a significant factor in order to control over the network.Through control on a network,any information can be spr...To discover and identify the influential nodes in any complex network has been an important issue.It is a significant factor in order to control over the network.Through control on a network,any information can be spread and stopped in a short span of time.Both targets can be achieved,since network of information can be extended and as well destroyed.So,information spread and community formation have become one of the most crucial issues in the world of SNA(Social Network Analysis).In this work,the complex network of twitter social network has been formalized and results are analyzed.For this purpose,different network metrics have been utilized.Visualization of the network is provided in its original form and then filter out(different percentages)from the network to eliminate the less impacting nodes and edges for better analysis.This network is analyzed according to different centrality measures,like edge-betweenness,betweenness centrality,closeness centrality and eigenvector centrality.Influential nodes are detected and their impact is observed on the network.The communities are analyzed in terms of network coverage considering theMinimum Spanning Tree,shortest path distribution and network diameter.It is found that these are the very effective ways to find influential and central nodes from such big social networks like Facebook,Instagram,Twitter,LinkedIn,etc.展开更多
Coherent pulse stacking(CPS) is a new time-domain coherent addition technique that stacks several optical pulses into a single output pulse, enabling high pulse energy and high average power. A Z-domain model target...Coherent pulse stacking(CPS) is a new time-domain coherent addition technique that stacks several optical pulses into a single output pulse, enabling high pulse energy and high average power. A Z-domain model targeting the pulsed laser is assembled to describe the optical interference process. An algorithm, extracting the cavity phase and pulse phases from limited data, where only the pulse intensity is available, is developed to diagnose optical cavity resonators. We also implement the algorithm on the cascaded system of multiple optical cavities,achieving phase errors less than 1.0°(root mean square), which could ensure the stability of CPS.展开更多
文摘With the rapid development of the Internet globally since the 21st century,the amount of data information has increased exponentially.Data helps improve people’s livelihood and working conditions,as well as learning efficiency.Therefore,data extraction,analysis,and processing have become a hot issue for people from all walks of life.Traditional recommendation algorithm still has some problems,such as inaccuracy,less diversity,and low performance.To solve these problems and improve the accuracy and variety of the recommendation algorithms,the research combines the convolutional neural networks(CNN)and the attention model to design a recommendation algorithm based on the neural network framework.Through the text convolutional network,the input layer in CNN has transformed into two channels:static ones and non-static ones.Meanwhile,the self-attention system focuses on the system so that data can be better processed and the accuracy of feature extraction becomes higher.The recommendation algorithm combines CNN and attention system and divides the embedding layer into user information feature embedding and data name feature extraction embedding.It obtains data name features through a convolution kernel.Finally,the top pooling layer obtains the length vector.The attention system layer obtains the characteristics of the data type.Experimental results show that the proposed recommendation algorithm that combines CNN and the attention system can perform better in data extraction than the traditional CNN algorithm and other recommendation algorithms that are popular at the present stage.The proposed algorithm shows excellent accuracy and robustness.
基金This work was funded by the Graduate Scientific Research School at Yarmouk University under Grant Number:82/2020。
文摘There are quintillions of data on deoxyribonucleic acid(DNA)and protein in publicly accessible data banks,and that number is expanding at an exponential rate.Many scientific fields,such as bioinformatics and drug discovery,rely on such data;nevertheless,gathering and extracting data from these resources is a tough undertaking.This data should go through several processes,including mining,data processing,analysis,and classification.This study proposes software that extracts data from big data repositories automatically and with the particular ability to repeat data extraction phases as many times as needed without human intervention.This software simulates the extraction of data from web-based(point-and-click)resources or graphical user interfaces that cannot be accessed using command-line tools.The software was evaluated by creating a novel database of 34 parameters for 1360 physicochemical properties of antimicrobial peptides(AMP)sequences(46240 hits)from various MARVIN software panels,which can be later utilized to develop novel AMPs.Furthermore,for machine learning research,the program was validated by extracting 10,000 protein tertiary structures from the Protein Data Bank.As a result,data collection from the web will become faster and less expensive,with no need for manual data extraction.The software is critical as a first step to preparing large datasets for subsequent stages of analysis,such as those using machine and deep-learning applications.
基金Dr.Arshiya Sajid Ansari would like to thank the Deanship of Scientific Research at Majmaah University for supporting this work under Project No.R-2023-910.
文摘Image steganography is a technique of concealing confidential information within an image without dramatically changing its outside look.Whereas vehicular ad hoc networks(VANETs),which enable vehicles to communicate with one another and with roadside infrastructure to enhance safety and traffic flow provide a range of value-added services,as they are an essential component of modern smart transportation systems.VANETs steganography has been suggested by many authors for secure,reliable message transfer between terminal/hope to terminal/hope and also to secure it from attack for privacy protection.This paper aims to determine whether using steganography is possible to improve data security and secrecy in VANET applications and to analyze effective steganography techniques for incorporating data into images while minimizing visual quality loss.According to simulations in literature and real-world studies,Image steganography proved to be an effectivemethod for secure communication on VANETs,even in difficult network conditions.In this research,we also explore a variety of steganography approaches for vehicular ad-hoc network transportation systems like vector embedding,statistics,spatial domain(SD),transform domain(TD),distortion,masking,and filtering.This study possibly shall help researchers to improve vehicle networks’ability to communicate securely and lay the door for innovative steganography methods.
基金Supported by Science Foundation of Zhejiang (No. 599008) ZUCC Science Research Foundation
文摘An improved self-organizing feature map (SOFM) neural network is presented to generate rectangular and hexagonal lattic with normal vector attached to each vertex. After the neural network was trained, the whole scattered data were divided into sub-regions where classified core were represented by the weight vectors of neurons at the output layer of neural network. The weight vectors of the neurons were used to approximate the dense 3-D scattered points, so the dense scattered points could be reduced to a reasonable scale, while the topological feature of the whole scattered points were remained.
基金Supported by the National Natural Science Foun-dation of China (60573091 ,60273018)
文摘A vision based query interface annotation meth od is used to relate attributes and form elements in form based web query interfaces, this method can reach accuracy of 82%. And a user participation method is used to tune the result; user can answer "yes" or "no" for existing annotations, or manually annotate form elements. Mass feedback is added to the annotation algorithm to produce more accurate result. By this approach, query interface annotation can reach a perfect accuracy.
基金Supported by the National Natural Science Foun-dation of China (60573091 ,60273018)National Basic Research andDevelopment Programof China(2003CB317000) +1 种基金the Key Project ofMinistry of Education of China (03044) Programfor NewCentu-ry Excellent Talents in University(NCET) .
文摘In this paper, we propose a flexible locationbased service (LBS) middleware framework to make the development and deployment of new location based applications much easier. Considering the World Wide Web as a huge data source of location relative information, we integrate the common used web data extraction techniques into the middleware framework, exposing a unified web data interface for the upper applications to make them more attractive. Besides, the framework also emphasizes some common LBS issues, including positioning, location modeling, location-dependent query processing, privacy and secure management.
文摘A large amount of data is present on the web which can be used for useful purposes like a product recommendation,price comparison and demand forecasting for a particular product.Websites are designed for human understanding and not for machines.Therefore,to make data machine-readable,it requires techniques to grab data from web pages.Researchers have addressed the problem using two approaches,i.e.,knowledge engineering and machine learning.State of the art knowledge engineering approaches use the structure of documents,visual cues,clustering of attributes of data records and text processing techniques to identify data records on a web page.Machine learning approaches use annotated pages to learn rules.These rules are used to extract data from unseen web pages.The structure of web documents is continuously evolving.Therefore,new techniques are needed to handle the emerging requirements of web data extraction.In this paper,we have presented a novel,simple and efficient technique to extract data from web pages using visual styles and structure of documents.The proposed technique detects Rich Data Region(RDR)using query and correlative words of the query.RDR is then divided into data records using style similarity.Noisy elements are removed using a Common Tag Sequence(CTS)and formatting entropy.The system is implemented using JAVA and runs on the dataset of real-world working websites.The effectiveness of results is evaluated using precision,recall,and F-measure and compared with five existing systems.A comparison of the proposed technique to existing systems has shown encouraging results.
基金Supported by the National High Technology Research and Development Programme of China(No.2009AA01 Z141)the National Natural Science Foundation of China(No.60573117)Beijing Natural Science Foundation(No.4131001)
文摘To extract structured data from a web page with customized requirements,a user labels some DOM elements on the page with attribute names.The common features of the labeled elements are utilized to guide the user through the labeling process to minimize user efforts,and are also utilized to retrieve attribute values.To turn the attribute values into a structured result,the attribute pattern needs to be induced.For this purpose,a space-optimized suffix tree called attribute tree is built to transform the document object model(DOM) tree into a simpler form while preserving its useful properties such as attribute sequence order.The pattern is induced bottom-up on the attribute tree,and is further used to build the structured result.Experiments are conducted and show high performance of our approach in terms of precision,recall and structural correctness.
基金Supported by the Shanghai Education Committee (No.06KZ016)
文摘The massive web-based information resources have led to an increasing demand for effective automatic retrieval of target information for web applications. This paper introduces a web-based data extraction tool that deploys various algorithms to locate, extract and filter tabular data from HTML pages and to transform them into new web-based representations. The tool has been applied in an aquaculture web application platform for extracting and generating aquatic product market information. Results prove that this tool is very effective in extracting the required data from web pages.
基金supported by the project "Remote Sensing Alteration Abnormity Extraction from Geological Survey in Northwestern Yunnan, China" from China Geological Survey
文摘Alteration is regarded as significant information for mineral exploration. In this study, ETM+ remote sensing data are used for recognizing and extracting alteration zones in northwestern Yunnan (云南), China. The principal component analysis (PCA) of ETM+ bands 1, 4, 5, and 7 was employed for OH alteration extractions. The PCA of ETM+ bands 1, 3, 4, and 5 was used for extracting Fe^2+ (Fe^3+) alterations. Interfering factors, such as vegetation, snow, and shadows, were masked. Alteration components were defined in the principal components (PCs) by the contributions of their diagnostic spectral bands. The zones of alteration identified from remote sensing were analyzed in detail along with geological surveys and field verification. The results show that the OH^- alteration is a main indicator of K-feldspar, phyllic, and prophilized alterations. These alterations are closely related to porphyry copper deposits. The Fe^2+ (Fe^3+) alteration indicates pyritization, which is mainly related to hydrothermal or skarn type polymetallic deposits.
文摘A semi structured data extraction method to get the useful information embedded in a group of relevant web pages and store it with OEM(Object Exchange Model) is proposed. Then, the data mining method is adopted to discover schema knowledge implicit in the semi structured data. This knowledge can make users understand the information structure on the web more deeply and thourouly. At the same time, it can also provide a kind of effective schema for the querying of web information.
基金supported by the National Natural Science Foundation of China (31871274)Natural Science Foundation of Chongqing,China (CSTB2022NSCQ-MSX0650)+2 种基金Science and Technology Research Program of Chongqing Municipal Education Commission (KJQN202100508)Team Project of Innovation Leading Talent in Chongqing (CQYC20210309536)“Contract System”Project of Chongqing Talent Plan (cstc2022ycjh-bgzxm0147)。
文摘Since the late 2010s,Artificial Intelligence(AI)including machine learning,boosted through deep learning,has boomed as a vital tool to leverage computer vision,natural language processing and speech recognition in revolutionizing zoological research.This review provides an overview of the primary tasks,core models,datasets,and applications of AI in zoological research,including animal classification,resource conservation,behavior,development,genetics and evolution,breeding and health,disease models,and paleontology.Additionally,we explore the challenges and future directions of integrating AI into this field.Based on numerous case studies,this review outlines various avenues for incorporating AI into zoological research and underscores its potential to enhance our understanding of the intricate relationships that exist within the animal kingdom.As we build a bridge between beast and byte realms,this review serves as a resource for envisioning novel AI applications in zoological research that have not yet been explored.
基金supported in part by the National Natural Science Foundation of China(61873335,61833011,62173164)the Project of Science and Technology Commission of Shanghai Municipality,China(20ZR1420200,21SQBS01600,22JC1401400,19510750300,21190780300)the Natural Science Foundation of Jiangsu Province of China(BK20201451)。
文摘This paper is concerned with the cooperative target stalking for a multi-unmanned surface vehicle(multi-USV)system.Based on the multi-agent deep deterministic policy gradient(MADDPG)algorithm,a multi-USV target stalking(MUTS)algorithm is proposed.Firstly,a V-type probabilistic data extraction method is proposed for the first time to overcome shortcomings of the MADDPG algorithm.The advantages of the proposed method are twofold:1)it can reduce the amount of data and shorten training time;2)it can filter out more important data in the experience buffer for training.Secondly,in order to avoid the collisions of USVs during the stalking process,an action constraint method called Safe DDPG is introduced.Finally,the MUTS algorithm and some existing algorithms are compared in cooperative target stalking scenarios.In order to demonstrate the effectiveness of the proposed MUTS algorithm in stalking tasks,mission operating scenarios and reward functions are well designed in this paper.The proposed MUTS algorithm can help the multi-USV system avoid internal collisions during the mission execution.Moreover,compared with some existing algorithms,the newly proposed one can provide a higher convergence speed and a narrower convergence domain.
基金supported in part by the Japan Society for the Promotion of Science(Kaken Grant 17K01336,April 2017–March 2020)the China Scholarship Council(CSC No.201606620007)+1 种基金the Ministry of Education,Culture,Sports,Science,and Technology of Japan(Monbukagakusho:MEXT Scholarship No.171572,2017–2021)the Ministry of Education,Culture,Sports,Science,and Technology of Japan(Monbukagakusho:MEXT scholarship,2019–2022)
文摘Natural hazard-triggered technological accidents(Natechs)refer to accidents involving releases of hazardous materials(hazmat)triggered by natural hazards.Huge economic losses,as well as human health and environmental problems are caused by Natechs.In this regard,learning from previous Natechs is critical for risk management.However,due to data scarcity and high uncertainty concerning such hazards,it becomes a serious challenge for risk managers to detect Natechs from large databases,such as the National Response Center(NRC)database.As the largest database of hazmat release incidents,the NRC database receives hazmat release reports from citizens in the United States.However,callers often have incomplete details about the incidents they are reporting.This results in many records having incomplete information.Consequently,it is quite difficult to identify and extract Natechs accurately and efficiently.In this study,we introduce machine learning theory into the Natech retrieving research,and a Semi-Intelligent Natech Identification Framework(SINIF)is proposed in order to solve the problem.We tested the suitability of two supervised machine learning algorithms,namely the Long ShortTerm Memory(LSTM)and the Convolutional Neural Network(CNN),and selected the former for the development of the SINIF.According to the results,the SINIF is efficient(a total number of 826,078 records were analyzed)and accurate(the accuracy is over 0.90),while 32,841 Natech reports between 1990 and 2017 were extracted from the NRC database.Furthermore,the majority of those Natech reports(97.85%)were related to meteorological phenomena,with hurricanes(24.41%),heavy rains(19.27%),and storms(18.29%)as the main causes of these reported Natechs.Overall,this study suggests that risk managers can benefit immensely from SINIF in analyzing Natech data from large databases efficiently.
文摘Due to continuous cutting tool usage,tool supervision is essential for improving the metal cutting industry.In the metal removal process tool,supervision is carried out either by an operator or online tool supervision.Tool super-vision helps to understand tool condition,dimensional accuracy,and surface superiority.For downtime in the metal cutting industry,the main reasons are tool breakage and excessive wear,so it is necessary to supervise tool which gives better tool life and enhance productivity.This paper presents different conventional and artificial intelligence techniques for tool supervision in the processing procedures that have been depicted in writing.
基金This project is supported by Major International Cooperation Program of NSFC Grant 60221120145 Chinese Folk Music Digital Library.
文摘Hidden Web provides great amount of domain-specific data for constructing knowledge services. Most previous knowledge extraction researches ignore the valuable data hidden in Web database, and related works do not refer how to make extracted information available for knowledge system. This paper describes a novel approach to build a domain-specific knowledge service with the data retrieved from Hidden Web. Ontology serves to model the domain knowledge. Queries forms of different Web sites are translated into machine-understandable format, defined knowledge concepts, so that they can be accessed automatically. Also knowledge data are extracted from Web pages and organized in ontology format knowledge. The experiment proves the algorithm achieves high accuracy and the system facilitates constructing knowledge services greatly.
基金supported by the National Key R&D Program of China(Grant No.2018YFB1702504)the National Natural Science Foundation of China(Grant Nos.52179121,51879284)+3 种基金the State Key Laboratory of Simulations and Regulation of Water Cycle in River Basin,China(Grant No.SKL2022ZD05)the IWHR Research&Development Support Program,China(Grant No.GE0145B012021)the Natural Science Foundation of Shaanxi Province,China(Grant No.2021JLM-50)the National Key R&D Program of China(Grant No.2022YFE0200400).
文摘This review summarizes the research outcomes and findings documented in 45 journal papers using a shared tunnel boring machine(TBM)dataset for performance prediction and boring efficiency optimization using machine learning methods.The big dataset was col-lected during the Yinsong water diversion project construction in China,covering the tunnel excavation of a 20 km-section with 199 items of monitoring metrics taken with an interval of one second.The research papers were the result of a call for contributions during a TBM machine learning contest in 2019 and covered a variety of topics related to the intelligent construction of TBM.This review com-prises two parts.Part I is concerned with the data processing,feature extraction,and machine learning methods applied by the contrib-utors.The review finds that the data-driven and knowledge-driven approaches in extracting important features applied by various authors are diversified,requiring further studies to achieve commonly accepted criteria.The techniques for cleaning and amending the raw data adopted by the contributors were summarized,indicating some highlights such as the importance of sufficiently high fre-quency of data acquisition(higher than 1 second),classification and standardization for the data preprocessing process,and the appro-priate selections of features in a boring cycle.The review finds that both supervised and unsupervised machine learning methods have been utilized by various researchers.The ensemble and deep learning methods have found wide applications.Part I highlights the impor-tant features of the individual methods applied by the contributors,including the structures of the algorithm,selection of hyperparam-eters,and model validation approaches.
基金This work was supported by the National Key Research and Development Program of China(Nos.2016QY03D0501,2017YFB0803300)the National Natural Science Foundation of China(Nos.61601146,61732022)Sichuan Science and Technology Program(No.2019YFSY0049).
文摘As a real-time and authoritative source,the official Web pages of organizations contain a large amount of information.The diversity of Web content and format makes it essential for pre-processing to get the unified attributed data,which has the value of organizational analysis and mining.The existing research on dealing with multiple Web scenarios and accuracy performance is insufficient.This paper aims to propose a method to transform organizational official Web pages into the data with attributes.After locating the active blocks in the Web pages,the structural and content features are proposed to classify information with the specific model.The extraction methods based on trigger lexicon and LSTM(Long Short-Term Memory)are proposed,which efficiently process the classified information and extract data that matches the attributes.Finally,an accurate and efficient method to classify and extract information from organizational official Web pages is formed.Experimental results show that our approach improves the performing indicators and exceeds the level of state of the art on real data set from organizational official Web pages.
文摘To discover and identify the influential nodes in any complex network has been an important issue.It is a significant factor in order to control over the network.Through control on a network,any information can be spread and stopped in a short span of time.Both targets can be achieved,since network of information can be extended and as well destroyed.So,information spread and community formation have become one of the most crucial issues in the world of SNA(Social Network Analysis).In this work,the complex network of twitter social network has been formalized and results are analyzed.For this purpose,different network metrics have been utilized.Visualization of the network is provided in its original form and then filter out(different percentages)from the network to eliminate the less impacting nodes and edges for better analysis.This network is analyzed according to different centrality measures,like edge-betweenness,betweenness centrality,closeness centrality and eigenvector centrality.Influential nodes are detected and their impact is observed on the network.The communities are analyzed in terms of network coverage considering theMinimum Spanning Tree,shortest path distribution and network diameter.It is found that these are the very effective ways to find influential and central nodes from such big social networks like Facebook,Instagram,Twitter,LinkedIn,etc.
基金supported by the Director,Office of Science,Office of High Energy Physics,of the U.S.Department of Energy under Contract No.DE-AC02-05CH11231by the National Natural Science Foundation of China under Grant No.11475097
文摘Coherent pulse stacking(CPS) is a new time-domain coherent addition technique that stacks several optical pulses into a single output pulse, enabling high pulse energy and high average power. A Z-domain model targeting the pulsed laser is assembled to describe the optical interference process. An algorithm, extracting the cavity phase and pulse phases from limited data, where only the pulse intensity is available, is developed to diagnose optical cavity resonators. We also implement the algorithm on the cascaded system of multiple optical cavities,achieving phase errors less than 1.0°(root mean square), which could ensure the stability of CPS.