In response to the lack of reliable physical parameters in the process simulation of the butadiene extraction,a large amount of phase equilibrium data were collected in the context of the actual process of butadiene p...In response to the lack of reliable physical parameters in the process simulation of the butadiene extraction,a large amount of phase equilibrium data were collected in the context of the actual process of butadiene production by acetonitrile.The accuracy of five prediction methods,UNIFAC(UNIQUAC Functional-group Activity Coefficients),UNIFAC-LL,UNIFAC-LBY,UNIFAC-DMD and COSMO-RS,applied to the butadiene extraction process was verified using partial phase equilibrium data.The results showed that the UNIFAC-DMD method had the highest accuracy in predicting phase equilibrium data for the missing system.COSMO-RS-predicted multiple systems showed good accuracy,and a large number of missing phase equilibrium data were estimated using the UNIFAC-DMD method and COSMO-RS method.The predicted phase equilibrium data were checked for consistency.The NRTL-RK(non-Random Two Liquid-Redlich-Kwong Equation of State)and UNIQUAC thermodynamic models were used to correlate the phase equilibrium data.Industrial device simulations were used to verify the accuracy of the thermodynamic model applied to the butadiene extraction process.The simulation results showed that the average deviations of the simulated results using the correlated thermodynamic model from the actual values were less than 2%compared to that using the commercial simulation software,Aspen Plus and its database.The average deviation was much smaller than that of the simulations using the Aspen Plus database(>10%),indicating that the obtained phase equilibrium data are highly accurate and reliable.The best phase equilibrium data and thermodynamic model parameters for butadiene extraction are provided.This improves the accuracy and reliability of the design,optimization and control of the process,and provides a basis and guarantee for developing a more environmentally friendly and economical butadiene extraction process.展开更多
Target maneuver recognition is a prerequisite for air combat situation awareness,trajectory prediction,threat assessment and maneuver decision.To get rid of the dependence of the current target maneuver recognition me...Target maneuver recognition is a prerequisite for air combat situation awareness,trajectory prediction,threat assessment and maneuver decision.To get rid of the dependence of the current target maneuver recognition method on empirical criteria and sample data,and automatically and adaptively complete the task of extracting the target maneuver pattern,in this paper,an air combat maneuver pattern extraction based on time series segmentation and clustering analysis is proposed by combining autoencoder,G-G clustering algorithm and the selective ensemble clustering analysis algorithm.Firstly,the autoencoder is used to extract key features of maneuvering trajectory to remove the impacts of redundant variables and reduce the data dimension;Then,taking the time information into account,the segmentation of Maneuver characteristic time series is realized with the improved FSTS-AEGG algorithm,and a large number of maneuver primitives are extracted;Finally,the maneuver primitives are grouped into some categories by using the selective ensemble multiple time series clustering algorithm,which can prove that each class represents a maneuver action.The maneuver pattern extraction method is applied to small scale air combat trajectory and can recognize and correctly partition at least 71.3%of maneuver actions,indicating that the method is effective and satisfies the requirements for engineering accuracy.In addition,this method can provide data support for various target maneuvering recognition methods proposed in the literature,greatly reduce the workload and improve the recognition accuracy.展开更多
In the process of constructing domain-specific knowledge graphs,the task of relational triple extraction plays a critical role in transforming unstructured text into structured information.Existing relational triple e...In the process of constructing domain-specific knowledge graphs,the task of relational triple extraction plays a critical role in transforming unstructured text into structured information.Existing relational triple extraction models facemultiple challenges when processing domain-specific data,including insufficient utilization of semantic interaction information between entities and relations,difficulties in handling challenging samples,and the scarcity of domain-specific datasets.To address these issues,our study introduces three innovative components:Relation semantic enhancement,data augmentation,and a voting strategy,all designed to significantly improve the model’s performance in tackling domain-specific relational triple extraction tasks.We first propose an innovative attention interaction module.This method significantly enhances the semantic interaction capabilities between entities and relations by integrating semantic information fromrelation labels.Second,we propose a voting strategy that effectively combines the strengths of large languagemodels(LLMs)and fine-tuned small pre-trained language models(SLMs)to reevaluate challenging samples,thereby improving the model’s adaptability in specific domains.Additionally,we explore the use of LLMs for data augmentation,aiming to generate domain-specific datasets to alleviate the scarcity of domain data.Experiments conducted on three domain-specific datasets demonstrate that our model outperforms existing comparative models in several aspects,with F1 scores exceeding the State of the Art models by 2%,1.6%,and 0.6%,respectively,validating the effectiveness and generalizability of our approach.展开更多
One of the biggest dangers to society today is terrorism, where attacks have become one of the most significantrisks to international peace and national security. Big data, information analysis, and artificial intelli...One of the biggest dangers to society today is terrorism, where attacks have become one of the most significantrisks to international peace and national security. Big data, information analysis, and artificial intelligence (AI) havebecome the basis for making strategic decisions in many sensitive areas, such as fraud detection, risk management,medical diagnosis, and counter-terrorism. However, there is still a need to assess how terrorist attacks are related,initiated, and detected. For this purpose, we propose a novel framework for classifying and predicting terroristattacks. The proposed framework posits that neglected text attributes included in the Global Terrorism Database(GTD) can influence the accuracy of the model’s classification of terrorist attacks, where each part of the datacan provide vital information to enrich the ability of classifier learning. Each data point in a multiclass taxonomyhas one or more tags attached to it, referred as “related tags.” We applied machine learning classifiers to classifyterrorist attack incidents obtained from the GTD. A transformer-based technique called DistilBERT extracts andlearns contextual features from text attributes to acquiremore information from text data. The extracted contextualfeatures are combined with the “key features” of the dataset and used to perform the final classification. Thestudy explored different experimental setups with various classifiers to evaluate the model’s performance. Theexperimental results show that the proposed framework outperforms the latest techniques for classifying terroristattacks with an accuracy of 98.7% using a combined feature set and extreme gradient boosting classifier.展开更多
With the rapid development of the Internet globally since the 21st century,the amount of data information has increased exponentially.Data helps improve people’s livelihood and working conditions,as well as learning ...With the rapid development of the Internet globally since the 21st century,the amount of data information has increased exponentially.Data helps improve people’s livelihood and working conditions,as well as learning efficiency.Therefore,data extraction,analysis,and processing have become a hot issue for people from all walks of life.Traditional recommendation algorithm still has some problems,such as inaccuracy,less diversity,and low performance.To solve these problems and improve the accuracy and variety of the recommendation algorithms,the research combines the convolutional neural networks(CNN)and the attention model to design a recommendation algorithm based on the neural network framework.Through the text convolutional network,the input layer in CNN has transformed into two channels:static ones and non-static ones.Meanwhile,the self-attention system focuses on the system so that data can be better processed and the accuracy of feature extraction becomes higher.The recommendation algorithm combines CNN and attention system and divides the embedding layer into user information feature embedding and data name feature extraction embedding.It obtains data name features through a convolution kernel.Finally,the top pooling layer obtains the length vector.The attention system layer obtains the characteristics of the data type.Experimental results show that the proposed recommendation algorithm that combines CNN and the attention system can perform better in data extraction than the traditional CNN algorithm and other recommendation algorithms that are popular at the present stage.The proposed algorithm shows excellent accuracy and robustness.展开更多
An algorithm named DPP is addressed.In it,a new model based on the concept of irregularity degree is founded to evaluate the regularity of cells.It generates the structure regularity of cells by exploiting the signal ...An algorithm named DPP is addressed.In it,a new model based on the concept of irregularity degree is founded to evaluate the regularity of cells.It generates the structure regularity of cells by exploiting the signal flow of circuit.Then,it converts the bit slice structure to parallel constraints to enable Q place algorithm.The design flow and the main algorithms are introduced.Finally,the satisfied experimental result of the tool compared with the Cadence placement tool SE is discussed.展开更多
An exhaustive study has been conducted to investigate span-based models for the joint entity and relation extraction task.However,these models sample a large number of negative entities and negative relations during t...An exhaustive study has been conducted to investigate span-based models for the joint entity and relation extraction task.However,these models sample a large number of negative entities and negative relations during the model training,which are essential but result in grossly imbalanced data distributions and in turn cause suboptimal model performance.In order to address the above issues,we propose a two-phase paradigm for the span-based joint entity and relation extraction,which involves classifying the entities and relations in the first phase,and predicting the types of these entities and relations in the second phase.The two-phase paradigm enables our model to significantly reduce the data distribution gap,including the gap between negative entities and other entities,aswell as the gap between negative relations and other relations.In addition,we make the first attempt at combining entity type and entity distance as global features,which has proven effective,especially for the relation extraction.Experimental results on several datasets demonstrate that the span-based joint extraction model augmented with the two-phase paradigm and the global features consistently outperforms previous state-ofthe-art span-based models for the joint extraction task,establishing a new standard benchmark.Qualitative and quantitative analyses further validate the effectiveness the proposed paradigm and the global features.展开更多
More web pages are widely applying AJAX (Asynchronous JavaScript XML) due to the rich interactivity and incremental communication. By observing, it is found that the AJAX contents, which could not be seen by traditi...More web pages are widely applying AJAX (Asynchronous JavaScript XML) due to the rich interactivity and incremental communication. By observing, it is found that the AJAX contents, which could not be seen by traditional crawler, are well-structured and belong to one specific domain generally. Extracting the structured data from AJAX contents and annotating its semantic are very significant for further applications. In this paper, a structured AJAX data extraction method for agricultural domain based on agricultural ontology was proposed. Firstly, Crawljax, an open AJAX crawling tool, was overridden to explore and retrieve the AJAX contents; secondly, the retrieved contents were partitioned into items and then classified by combining with agricultural ontology. HTML tags and punctuations were used to segment the retrieved contents into entity items. Finally, the entity items were clustered and the semantic annotation was assigned to clustering results according to agricultural ontology. By experimental evaluation, the proposed approach was proved effectively in resource exploring, entity extraction, and semantic annotation.展开更多
The development of cloud computing and virtualization technology has brought great challenges to the reliability of data center services.Data centers typically contain a large number of compute and storage nodes which...The development of cloud computing and virtualization technology has brought great challenges to the reliability of data center services.Data centers typically contain a large number of compute and storage nodes which may fail and affect the quality of service.Failure prediction is an important means of ensuring service availability.Predicting node failure in cloud-based data centers is challenging because the failure symptoms reflected have complex characteristics,and the distribution imbalance between the failure sample and the normal sample is widespread,resulting in inaccurate failure prediction.Targeting these challenges,this paper proposes a novel failure prediction method FP-STE(Failure Prediction based on Spatio-temporal Feature Extraction).Firstly,an improved recurrent neural network HW-GRU(Improved GRU based on HighWay network)and a convolutional neural network CNN are used to extract the temporal features and spatial features of multivariate data respectively to increase the discrimination of different types of failure symptoms which improves the accuracy of prediction.Then the intermediate results of the two models are added as features into SCSXGBoost to predict the possibility and the precise type of node failure in the future.SCS-XGBoost is an ensemble learning model that is improved by the integrated strategy of oversampling and cost-sensitive learning.Experimental results based on real data sets confirm the effectiveness and superiority of FP-STE.展开更多
A 16 kV/20 A power supply was developed for the extraction grid of prototype radio frequency(RF) ion source of neutral beam injector. To acquire the state signals of extraction grid power supply(EGPS) and control ...A 16 kV/20 A power supply was developed for the extraction grid of prototype radio frequency(RF) ion source of neutral beam injector. To acquire the state signals of extraction grid power supply(EGPS) and control the operation of the EGPS, a data acquisition and control system has been developed. This system mainly consists of interlock protection circuit board, photoelectric conversion circuit, optical fibers, industrial compact peripheral component interconnect(CPCI) computer and host computer. The human machine interface of host computer delivers commands and data to program of the CPCI computer, as well as offers a convenient client for setting parameters and displaying EGPS status. The CPCI computer acquires the status of the power supply. The system can turn-off the EGPS quickly when the faults of EGPS occur. The system has been applied to the EGPS of prototype RF ion source. Test results show that the data acquisition and control system for the EGPS can meet the requirements of the operation of prototype RF ion source.展开更多
In recent years,biometric sensors are applicable for identifying impor-tant individual information and accessing the control using various identifiers by including the characteristics like afingerprint,palm print,iris r...In recent years,biometric sensors are applicable for identifying impor-tant individual information and accessing the control using various identifiers by including the characteristics like afingerprint,palm print,iris recognition,and so on.However,the precise identification of human features is still physically chal-lenging in humans during their lifetime resulting in a variance in their appearance or features.In response to these challenges,a novel Multimodal Biometric Feature Extraction(MBFE)model is proposed to extract the features from the noisy sen-sor data using a modified Ranking-based Deep Convolution Neural Network(RDCNN).The proposed MBFE model enables the feature extraction from differ-ent biometric images that includes iris,palm print,and lip,where the images are preprocessed initially for further processing.The extracted features are validated after optimal extraction by the RDCNN by splitting the datasets to train the fea-ture extraction model and then testing the model with different sets of input images.The simulation is performed in matlab to test the efficacy of the modal over multi-modal datasets and the simulation result shows that the proposed meth-od achieves increased accuracy,precision,recall,and F1 score than the existing deep learning feature extraction methods.The performance improvement of the MBFE Algorithm technique in terms of accuracy,precision,recall,and F1 score is attained by 0.126%,0.152%,0.184%,and 0.38%with existing Back Propaga-tion Neural Network(BPNN),Human Identification Using Wavelet Transform(HIUWT),Segmentation Methodology for Non-cooperative Recognition(SMNR),Daugman Iris Localization Algorithm(DILA)feature extraction techni-ques respectively.展开更多
Web data extraction is to obtain valuable data from the tremendous information resource of the World Wide Web according to the pre - defined pattern. It processes and classifies the data on the Web. Formalization of t...Web data extraction is to obtain valuable data from the tremendous information resource of the World Wide Web according to the pre - defined pattern. It processes and classifies the data on the Web. Formalization of the procedure of Web data extraction is presented, as well as the description of crawling and extraction algorithm. Based on the formalization, an XML - based page structure description language, TIDL, is brought out, including the object model, the HTML object reference model and definition of tags. At the final part, a Web data gathering and querying application based on Internet agent technology, named Web Integration Services Kit (WISK) is mentioned.展开更多
Satellite remote sensing data are usually used to analyze the spatial distribution pattern of geological structures and generally serve as a significant means for the identification of alteration zones. Based on the L...Satellite remote sensing data are usually used to analyze the spatial distribution pattern of geological structures and generally serve as a significant means for the identification of alteration zones. Based on the Landsat Enhanced Thematic Mapper (ETM+) data, which have better spectral resolution (8 bands) and spatial resolution (15 m in PAN band), the synthesis processing techniques were presented to fulfill alteration information extraction: data preparation, vegetation indices and band ratios, and expert classifier-based classification. These techniques have been implemented in the MapGIS-RSP software (version 1.0), developed by the Wuhan Zondy Cyber Technology Co., Ltd, China. In the study area application of extracting alteration information in the Zhaoyuan (招远) gold mines, Shandong (山东) Province, China, several hydorthermally altered zones (included two new sites) were found after satellite imagery interpretation coupled with field surveys. It is concluded that these synthesis processing techniques are useful approaches and are applicable to a wide range of gold-mineralized alteration information extraction.展开更多
In this paper,we present an open python procedure with Jupyter notebook,for data extraction and vectorization of geophysical explo ration profile.Constrained by observation routes and traffic conditions,geophysical ex...In this paper,we present an open python procedure with Jupyter notebook,for data extraction and vectorization of geophysical explo ration profile.Constrained by observation routes and traffic conditions,geophysical exploration profiles tend to bend curved roads for easy observation,however,it must be projected onto a straight line when data processing and analyzing.After projection,we don’t know the true position of the obtained crustal structure.Nonetheless,when the results used as an initial constraint condition for other geophysical inversion,such as gravity inversion,we need to know the true position of the data rather than the distance to the starting point.We solved this problem by profile vectorization and reprojection.The method can be used for extraction data of various geophysical exploration profiles,such as seismic reflection profiles,gravity profiles.展开更多
Alteration is regarded as significant information for mineral exploration. In this study, ETM+ remote sensing data are used for recognizing and extracting alteration zones in northwestern Yunnan (云南), China. The ...Alteration is regarded as significant information for mineral exploration. In this study, ETM+ remote sensing data are used for recognizing and extracting alteration zones in northwestern Yunnan (云南), China. The principal component analysis (PCA) of ETM+ bands 1, 4, 5, and 7 was employed for OH alteration extractions. The PCA of ETM+ bands 1, 3, 4, and 5 was used for extracting Fe^2+ (Fe^3+) alterations. Interfering factors, such as vegetation, snow, and shadows, were masked. Alteration components were defined in the principal components (PCs) by the contributions of their diagnostic spectral bands. The zones of alteration identified from remote sensing were analyzed in detail along with geological surveys and field verification. The results show that the OH^- alteration is a main indicator of K-feldspar, phyllic, and prophilized alterations. These alterations are closely related to porphyry copper deposits. The Fe^2+ (Fe^3+) alteration indicates pyritization, which is mainly related to hydrothermal or skarn type polymetallic deposits.展开更多
The massive web-based information resources have led to an increasing demand for effective automatic retrieval of target information for web applications. This paper introduces a web-based data extraction tool that de...The massive web-based information resources have led to an increasing demand for effective automatic retrieval of target information for web applications. This paper introduces a web-based data extraction tool that deploys various algorithms to locate, extract and filter tabular data from HTML pages and to transform them into new web-based representations. The tool has been applied in an aquaculture web application platform for extracting and generating aquatic product market information. Results prove that this tool is very effective in extracting the required data from web pages.展开更多
A large amount of data is present on the web which can be used for useful purposes like a product recommendation,price comparison and demand forecasting for a particular product.Websites are designed for human underst...A large amount of data is present on the web which can be used for useful purposes like a product recommendation,price comparison and demand forecasting for a particular product.Websites are designed for human understanding and not for machines.Therefore,to make data machine-readable,it requires techniques to grab data from web pages.Researchers have addressed the problem using two approaches,i.e.,knowledge engineering and machine learning.State of the art knowledge engineering approaches use the structure of documents,visual cues,clustering of attributes of data records and text processing techniques to identify data records on a web page.Machine learning approaches use annotated pages to learn rules.These rules are used to extract data from unseen web pages.The structure of web documents is continuously evolving.Therefore,new techniques are needed to handle the emerging requirements of web data extraction.In this paper,we have presented a novel,simple and efficient technique to extract data from web pages using visual styles and structure of documents.The proposed technique detects Rich Data Region(RDR)using query and correlative words of the query.RDR is then divided into data records using style similarity.Noisy elements are removed using a Common Tag Sequence(CTS)and formatting entropy.The system is implemented using JAVA and runs on the dataset of real-world working websites.The effectiveness of results is evaluated using precision,recall,and F-measure and compared with five existing systems.A comparison of the proposed technique to existing systems has shown encouraging results.展开更多
A semi structured data extraction method to get the useful information embedded in a group of relevant web pages and store it with OEM(Object Exchange Model) is proposed. Then, the data mining method is adopted to dis...A semi structured data extraction method to get the useful information embedded in a group of relevant web pages and store it with OEM(Object Exchange Model) is proposed. Then, the data mining method is adopted to discover schema knowledge implicit in the semi structured data. This knowledge can make users understand the information structure on the web more deeply and thourouly. At the same time, it can also provide a kind of effective schema for the querying of web information.展开更多
A method of 3D model reconstruction based on scattered point data in reverse engineering is presented here. The topological relationship of scattered points was established firstly, then the data set was triangulated ...A method of 3D model reconstruction based on scattered point data in reverse engineering is presented here. The topological relationship of scattered points was established firstly, then the data set was triangulated to reconstruct the mesh surface model. The curvatures of cloud data were calculated based on the mesh surface, and the point data were segmented by edge-based method; Every patch of data was fitted by quadric surface of freeform surface, and the type of quadric surface was decided by parameters automatically, at last the whole CAD model was created. An example of mouse model was employed to confirm the effect of the algorithm.展开更多
In order to extract the boundary of rural habitation, based on geographic name data and basic geographic information data, an extraction method that use polygon aggregation is raised, it can extract the boundary of th...In order to extract the boundary of rural habitation, based on geographic name data and basic geographic information data, an extraction method that use polygon aggregation is raised, it can extract the boundary of three levels of rural habitation consists of town, administrative village and nature village. The method first extracts the boundary of nature village by aggregating the resident polygon, then extracts the boundary of administrative village by aggregating the boundary of nature village, and last extracts the boundary of town by aggregating the boundary of administrative village. The related methods of extracting the boundary of those three levels rural habitation has been given in detail during the experiment with basic geographic information data and geographic name data. Experimental results show the method can be a reference for boundary extraction of rural habitation.展开更多
基金supported by the National Natural Science Foundation of China(22178190)。
文摘In response to the lack of reliable physical parameters in the process simulation of the butadiene extraction,a large amount of phase equilibrium data were collected in the context of the actual process of butadiene production by acetonitrile.The accuracy of five prediction methods,UNIFAC(UNIQUAC Functional-group Activity Coefficients),UNIFAC-LL,UNIFAC-LBY,UNIFAC-DMD and COSMO-RS,applied to the butadiene extraction process was verified using partial phase equilibrium data.The results showed that the UNIFAC-DMD method had the highest accuracy in predicting phase equilibrium data for the missing system.COSMO-RS-predicted multiple systems showed good accuracy,and a large number of missing phase equilibrium data were estimated using the UNIFAC-DMD method and COSMO-RS method.The predicted phase equilibrium data were checked for consistency.The NRTL-RK(non-Random Two Liquid-Redlich-Kwong Equation of State)and UNIQUAC thermodynamic models were used to correlate the phase equilibrium data.Industrial device simulations were used to verify the accuracy of the thermodynamic model applied to the butadiene extraction process.The simulation results showed that the average deviations of the simulated results using the correlated thermodynamic model from the actual values were less than 2%compared to that using the commercial simulation software,Aspen Plus and its database.The average deviation was much smaller than that of the simulations using the Aspen Plus database(>10%),indicating that the obtained phase equilibrium data are highly accurate and reliable.The best phase equilibrium data and thermodynamic model parameters for butadiene extraction are provided.This improves the accuracy and reliability of the design,optimization and control of the process,and provides a basis and guarantee for developing a more environmentally friendly and economical butadiene extraction process.
基金supported by the National Natural Science Foundation of China (Project No.72301293)。
文摘Target maneuver recognition is a prerequisite for air combat situation awareness,trajectory prediction,threat assessment and maneuver decision.To get rid of the dependence of the current target maneuver recognition method on empirical criteria and sample data,and automatically and adaptively complete the task of extracting the target maneuver pattern,in this paper,an air combat maneuver pattern extraction based on time series segmentation and clustering analysis is proposed by combining autoencoder,G-G clustering algorithm and the selective ensemble clustering analysis algorithm.Firstly,the autoencoder is used to extract key features of maneuvering trajectory to remove the impacts of redundant variables and reduce the data dimension;Then,taking the time information into account,the segmentation of Maneuver characteristic time series is realized with the improved FSTS-AEGG algorithm,and a large number of maneuver primitives are extracted;Finally,the maneuver primitives are grouped into some categories by using the selective ensemble multiple time series clustering algorithm,which can prove that each class represents a maneuver action.The maneuver pattern extraction method is applied to small scale air combat trajectory and can recognize and correctly partition at least 71.3%of maneuver actions,indicating that the method is effective and satisfies the requirements for engineering accuracy.In addition,this method can provide data support for various target maneuvering recognition methods proposed in the literature,greatly reduce the workload and improve the recognition accuracy.
基金Science and Technology Innovation 2030-Major Project of“New Generation Artificial Intelligence”granted by Ministry of Science and Technology,Grant Number 2020AAA0109300.
文摘In the process of constructing domain-specific knowledge graphs,the task of relational triple extraction plays a critical role in transforming unstructured text into structured information.Existing relational triple extraction models facemultiple challenges when processing domain-specific data,including insufficient utilization of semantic interaction information between entities and relations,difficulties in handling challenging samples,and the scarcity of domain-specific datasets.To address these issues,our study introduces three innovative components:Relation semantic enhancement,data augmentation,and a voting strategy,all designed to significantly improve the model’s performance in tackling domain-specific relational triple extraction tasks.We first propose an innovative attention interaction module.This method significantly enhances the semantic interaction capabilities between entities and relations by integrating semantic information fromrelation labels.Second,we propose a voting strategy that effectively combines the strengths of large languagemodels(LLMs)and fine-tuned small pre-trained language models(SLMs)to reevaluate challenging samples,thereby improving the model’s adaptability in specific domains.Additionally,we explore the use of LLMs for data augmentation,aiming to generate domain-specific datasets to alleviate the scarcity of domain data.Experiments conducted on three domain-specific datasets demonstrate that our model outperforms existing comparative models in several aspects,with F1 scores exceeding the State of the Art models by 2%,1.6%,and 0.6%,respectively,validating the effectiveness and generalizability of our approach.
文摘One of the biggest dangers to society today is terrorism, where attacks have become one of the most significantrisks to international peace and national security. Big data, information analysis, and artificial intelligence (AI) havebecome the basis for making strategic decisions in many sensitive areas, such as fraud detection, risk management,medical diagnosis, and counter-terrorism. However, there is still a need to assess how terrorist attacks are related,initiated, and detected. For this purpose, we propose a novel framework for classifying and predicting terroristattacks. The proposed framework posits that neglected text attributes included in the Global Terrorism Database(GTD) can influence the accuracy of the model’s classification of terrorist attacks, where each part of the datacan provide vital information to enrich the ability of classifier learning. Each data point in a multiclass taxonomyhas one or more tags attached to it, referred as “related tags.” We applied machine learning classifiers to classifyterrorist attack incidents obtained from the GTD. A transformer-based technique called DistilBERT extracts andlearns contextual features from text attributes to acquiremore information from text data. The extracted contextualfeatures are combined with the “key features” of the dataset and used to perform the final classification. Thestudy explored different experimental setups with various classifiers to evaluate the model’s performance. Theexperimental results show that the proposed framework outperforms the latest techniques for classifying terroristattacks with an accuracy of 98.7% using a combined feature set and extreme gradient boosting classifier.
文摘With the rapid development of the Internet globally since the 21st century,the amount of data information has increased exponentially.Data helps improve people’s livelihood and working conditions,as well as learning efficiency.Therefore,data extraction,analysis,and processing have become a hot issue for people from all walks of life.Traditional recommendation algorithm still has some problems,such as inaccuracy,less diversity,and low performance.To solve these problems and improve the accuracy and variety of the recommendation algorithms,the research combines the convolutional neural networks(CNN)and the attention model to design a recommendation algorithm based on the neural network framework.Through the text convolutional network,the input layer in CNN has transformed into two channels:static ones and non-static ones.Meanwhile,the self-attention system focuses on the system so that data can be better processed and the accuracy of feature extraction becomes higher.The recommendation algorithm combines CNN and attention system and divides the embedding layer into user information feature embedding and data name feature extraction embedding.It obtains data name features through a convolution kernel.Finally,the top pooling layer obtains the length vector.The attention system layer obtains the characteristics of the data type.Experimental results show that the proposed recommendation algorithm that combines CNN and the attention system can perform better in data extraction than the traditional CNN algorithm and other recommendation algorithms that are popular at the present stage.The proposed algorithm shows excellent accuracy and robustness.
文摘An algorithm named DPP is addressed.In it,a new model based on the concept of irregularity degree is founded to evaluate the regularity of cells.It generates the structure regularity of cells by exploiting the signal flow of circuit.Then,it converts the bit slice structure to parallel constraints to enable Q place algorithm.The design flow and the main algorithms are introduced.Finally,the satisfied experimental result of the tool compared with the Cadence placement tool SE is discussed.
基金supported by the National Key Research and Development Program[2020YFB1006302].
文摘An exhaustive study has been conducted to investigate span-based models for the joint entity and relation extraction task.However,these models sample a large number of negative entities and negative relations during the model training,which are essential but result in grossly imbalanced data distributions and in turn cause suboptimal model performance.In order to address the above issues,we propose a two-phase paradigm for the span-based joint entity and relation extraction,which involves classifying the entities and relations in the first phase,and predicting the types of these entities and relations in the second phase.The two-phase paradigm enables our model to significantly reduce the data distribution gap,including the gap between negative entities and other entities,aswell as the gap between negative relations and other relations.In addition,we make the first attempt at combining entity type and entity distance as global features,which has proven effective,especially for the relation extraction.Experimental results on several datasets demonstrate that the span-based joint extraction model augmented with the two-phase paradigm and the global features consistently outperforms previous state-ofthe-art span-based models for the joint extraction task,establishing a new standard benchmark.Qualitative and quantitative analyses further validate the effectiveness the proposed paradigm and the global features.
基金supported by the Knowledge Innovation Program of the Chinese Academy of Sciencesthe National High-Tech R&D Program of China(2008BAK49B05)
文摘More web pages are widely applying AJAX (Asynchronous JavaScript XML) due to the rich interactivity and incremental communication. By observing, it is found that the AJAX contents, which could not be seen by traditional crawler, are well-structured and belong to one specific domain generally. Extracting the structured data from AJAX contents and annotating its semantic are very significant for further applications. In this paper, a structured AJAX data extraction method for agricultural domain based on agricultural ontology was proposed. Firstly, Crawljax, an open AJAX crawling tool, was overridden to explore and retrieve the AJAX contents; secondly, the retrieved contents were partitioned into items and then classified by combining with agricultural ontology. HTML tags and punctuations were used to segment the retrieved contents into entity items. Finally, the entity items were clustered and the semantic annotation was assigned to clustering results according to agricultural ontology. By experimental evaluation, the proposed approach was proved effectively in resource exploring, entity extraction, and semantic annotation.
基金supported in part by National Key Research and Development Program of China(2019YFB2103200)NSFC(61672108),Open Subject Funds of Science and Technology on Information Transmission and Dissemination in Communication Networks Laboratory(SKX182010049)+1 种基金the Fundamental Research Funds for the Central Universities(5004193192019PTB-019)the Industrial Internet Innovation and Development Project 2018 of China.
文摘The development of cloud computing and virtualization technology has brought great challenges to the reliability of data center services.Data centers typically contain a large number of compute and storage nodes which may fail and affect the quality of service.Failure prediction is an important means of ensuring service availability.Predicting node failure in cloud-based data centers is challenging because the failure symptoms reflected have complex characteristics,and the distribution imbalance between the failure sample and the normal sample is widespread,resulting in inaccurate failure prediction.Targeting these challenges,this paper proposes a novel failure prediction method FP-STE(Failure Prediction based on Spatio-temporal Feature Extraction).Firstly,an improved recurrent neural network HW-GRU(Improved GRU based on HighWay network)and a convolutional neural network CNN are used to extract the temporal features and spatial features of multivariate data respectively to increase the discrimination of different types of failure symptoms which improves the accuracy of prediction.Then the intermediate results of the two models are added as features into SCSXGBoost to predict the possibility and the precise type of node failure in the future.SCS-XGBoost is an ensemble learning model that is improved by the integrated strategy of oversampling and cost-sensitive learning.Experimental results based on real data sets confirm the effectiveness and superiority of FP-STE.
基金supported by National Natural Science Foundation of China(Contract Nos.11505225&11675216)Foundation of ASIPP(Contract No.DSJJ-15-GC03)the Key Program of Research and Development of Hefei Science Center,CAS(2016HSC-KPRD002)
文摘A 16 kV/20 A power supply was developed for the extraction grid of prototype radio frequency(RF) ion source of neutral beam injector. To acquire the state signals of extraction grid power supply(EGPS) and control the operation of the EGPS, a data acquisition and control system has been developed. This system mainly consists of interlock protection circuit board, photoelectric conversion circuit, optical fibers, industrial compact peripheral component interconnect(CPCI) computer and host computer. The human machine interface of host computer delivers commands and data to program of the CPCI computer, as well as offers a convenient client for setting parameters and displaying EGPS status. The CPCI computer acquires the status of the power supply. The system can turn-off the EGPS quickly when the faults of EGPS occur. The system has been applied to the EGPS of prototype RF ion source. Test results show that the data acquisition and control system for the EGPS can meet the requirements of the operation of prototype RF ion source.
文摘In recent years,biometric sensors are applicable for identifying impor-tant individual information and accessing the control using various identifiers by including the characteristics like afingerprint,palm print,iris recognition,and so on.However,the precise identification of human features is still physically chal-lenging in humans during their lifetime resulting in a variance in their appearance or features.In response to these challenges,a novel Multimodal Biometric Feature Extraction(MBFE)model is proposed to extract the features from the noisy sen-sor data using a modified Ranking-based Deep Convolution Neural Network(RDCNN).The proposed MBFE model enables the feature extraction from differ-ent biometric images that includes iris,palm print,and lip,where the images are preprocessed initially for further processing.The extracted features are validated after optimal extraction by the RDCNN by splitting the datasets to train the fea-ture extraction model and then testing the model with different sets of input images.The simulation is performed in matlab to test the efficacy of the modal over multi-modal datasets and the simulation result shows that the proposed meth-od achieves increased accuracy,precision,recall,and F1 score than the existing deep learning feature extraction methods.The performance improvement of the MBFE Algorithm technique in terms of accuracy,precision,recall,and F1 score is attained by 0.126%,0.152%,0.184%,and 0.38%with existing Back Propaga-tion Neural Network(BPNN),Human Identification Using Wavelet Transform(HIUWT),Segmentation Methodology for Non-cooperative Recognition(SMNR),Daugman Iris Localization Algorithm(DILA)feature extraction techni-ques respectively.
基金Note:Contents discussed in this paper are part of a key project,No.2000-A31-01-04,sponsored by Ministry of Science and Technology of P.R.China
文摘Web data extraction is to obtain valuable data from the tremendous information resource of the World Wide Web according to the pre - defined pattern. It processes and classifies the data on the Web. Formalization of the procedure of Web data extraction is presented, as well as the description of crawling and extraction algorithm. Based on the formalization, an XML - based page structure description language, TIDL, is brought out, including the object model, the HTML object reference model and definition of tags. At the final part, a Web data gathering and querying application based on Internet agent technology, named Web Integration Services Kit (WISK) is mentioned.
基金The paper is supported by the Research Foundation for Out-standing Young Teachers, China University of Geosciences (Wuhan) (Nos. CUGQNL0628, CUGQNL0640)the National High-Tech Research and Development Program (863 Program) (No. 2001AA135170)the Postdoctoral Foundation of the Shandong Zhaojin Group Co. (No. 20050262120)
文摘Satellite remote sensing data are usually used to analyze the spatial distribution pattern of geological structures and generally serve as a significant means for the identification of alteration zones. Based on the Landsat Enhanced Thematic Mapper (ETM+) data, which have better spectral resolution (8 bands) and spatial resolution (15 m in PAN band), the synthesis processing techniques were presented to fulfill alteration information extraction: data preparation, vegetation indices and band ratios, and expert classifier-based classification. These techniques have been implemented in the MapGIS-RSP software (version 1.0), developed by the Wuhan Zondy Cyber Technology Co., Ltd, China. In the study area application of extracting alteration information in the Zhaoyuan (招远) gold mines, Shandong (山东) Province, China, several hydorthermally altered zones (included two new sites) were found after satellite imagery interpretation coupled with field surveys. It is concluded that these synthesis processing techniques are useful approaches and are applicable to a wide range of gold-mineralized alteration information extraction.
基金the support from Science for Earthquake Resilience of the China Earthquake Administration(XH17022)the National Natural Science Foundation of China(Grant No.U1939204,No.41204014)+1 种基金the Scientific Research Fund of Institute of Seismology and Institute of Crustal Dynamics,China Earthquake Administration(Grant No.IS20146141)National Key Research and Development Plan(Grant No.2017YFC1500204).
文摘In this paper,we present an open python procedure with Jupyter notebook,for data extraction and vectorization of geophysical explo ration profile.Constrained by observation routes and traffic conditions,geophysical exploration profiles tend to bend curved roads for easy observation,however,it must be projected onto a straight line when data processing and analyzing.After projection,we don’t know the true position of the obtained crustal structure.Nonetheless,when the results used as an initial constraint condition for other geophysical inversion,such as gravity inversion,we need to know the true position of the data rather than the distance to the starting point.We solved this problem by profile vectorization and reprojection.The method can be used for extraction data of various geophysical exploration profiles,such as seismic reflection profiles,gravity profiles.
基金supported by the project "Remote Sensing Alteration Abnormity Extraction from Geological Survey in Northwestern Yunnan, China" from China Geological Survey
文摘Alteration is regarded as significant information for mineral exploration. In this study, ETM+ remote sensing data are used for recognizing and extracting alteration zones in northwestern Yunnan (云南), China. The principal component analysis (PCA) of ETM+ bands 1, 4, 5, and 7 was employed for OH alteration extractions. The PCA of ETM+ bands 1, 3, 4, and 5 was used for extracting Fe^2+ (Fe^3+) alterations. Interfering factors, such as vegetation, snow, and shadows, were masked. Alteration components were defined in the principal components (PCs) by the contributions of their diagnostic spectral bands. The zones of alteration identified from remote sensing were analyzed in detail along with geological surveys and field verification. The results show that the OH^- alteration is a main indicator of K-feldspar, phyllic, and prophilized alterations. These alterations are closely related to porphyry copper deposits. The Fe^2+ (Fe^3+) alteration indicates pyritization, which is mainly related to hydrothermal or skarn type polymetallic deposits.
基金Supported by the Shanghai Education Committee (No.06KZ016)
文摘The massive web-based information resources have led to an increasing demand for effective automatic retrieval of target information for web applications. This paper introduces a web-based data extraction tool that deploys various algorithms to locate, extract and filter tabular data from HTML pages and to transform them into new web-based representations. The tool has been applied in an aquaculture web application platform for extracting and generating aquatic product market information. Results prove that this tool is very effective in extracting the required data from web pages.
文摘A large amount of data is present on the web which can be used for useful purposes like a product recommendation,price comparison and demand forecasting for a particular product.Websites are designed for human understanding and not for machines.Therefore,to make data machine-readable,it requires techniques to grab data from web pages.Researchers have addressed the problem using two approaches,i.e.,knowledge engineering and machine learning.State of the art knowledge engineering approaches use the structure of documents,visual cues,clustering of attributes of data records and text processing techniques to identify data records on a web page.Machine learning approaches use annotated pages to learn rules.These rules are used to extract data from unseen web pages.The structure of web documents is continuously evolving.Therefore,new techniques are needed to handle the emerging requirements of web data extraction.In this paper,we have presented a novel,simple and efficient technique to extract data from web pages using visual styles and structure of documents.The proposed technique detects Rich Data Region(RDR)using query and correlative words of the query.RDR is then divided into data records using style similarity.Noisy elements are removed using a Common Tag Sequence(CTS)and formatting entropy.The system is implemented using JAVA and runs on the dataset of real-world working websites.The effectiveness of results is evaluated using precision,recall,and F-measure and compared with five existing systems.A comparison of the proposed technique to existing systems has shown encouraging results.
文摘A semi structured data extraction method to get the useful information embedded in a group of relevant web pages and store it with OEM(Object Exchange Model) is proposed. Then, the data mining method is adopted to discover schema knowledge implicit in the semi structured data. This knowledge can make users understand the information structure on the web more deeply and thourouly. At the same time, it can also provide a kind of effective schema for the querying of web information.
文摘A method of 3D model reconstruction based on scattered point data in reverse engineering is presented here. The topological relationship of scattered points was established firstly, then the data set was triangulated to reconstruct the mesh surface model. The curvatures of cloud data were calculated based on the mesh surface, and the point data were segmented by edge-based method; Every patch of data was fitted by quadric surface of freeform surface, and the type of quadric surface was decided by parameters automatically, at last the whole CAD model was created. An example of mouse model was employed to confirm the effect of the algorithm.
文摘In order to extract the boundary of rural habitation, based on geographic name data and basic geographic information data, an extraction method that use polygon aggregation is raised, it can extract the boundary of three levels of rural habitation consists of town, administrative village and nature village. The method first extracts the boundary of nature village by aggregating the resident polygon, then extracts the boundary of administrative village by aggregating the boundary of nature village, and last extracts the boundary of town by aggregating the boundary of administrative village. The related methods of extracting the boundary of those three levels rural habitation has been given in detail during the experiment with basic geographic information data and geographic name data. Experimental results show the method can be a reference for boundary extraction of rural habitation.