Analyzing Research and Development(R&D)trends is important because it can influence future decisions regarding R&D direction.In typical trend analysis,topic or technology taxonomies are employed to compute the...Analyzing Research and Development(R&D)trends is important because it can influence future decisions regarding R&D direction.In typical trend analysis,topic or technology taxonomies are employed to compute the popularities of the topics or codes over time.Although it is simple and effective,the taxonomies are difficult to manage because new technologies are introduced rapidly.Therefore,recent studies exploit deep learning to extract pre-defined targets such as problems and solutions.Based on the recent advances in question answering(QA)using deep learning,we adopt a multi-turn QA model to extract problems and solutions from Korean R&D reports.With the previous research,we use the reports directly and analyze the difficulties in handling them using QA style on Information Extraction(IE)for sentence-level benchmark dataset.After investigating the characteristics of Korean R&D,we propose a model to deal with multiple and repeated appearances of targets in the reports.Accordingly,we propose a model that includes an algorithm with two novel modules and a prompt.A newly proposed methodology focuses on reformulating a question without a static template or pre-defined knowledge.We show the effectiveness of the proposed model using a Korean R&D report dataset that we constructed and presented an in-depth analysis of the benefits of the multi-turn QA model.展开更多
Because of the developed economy and lush vegetation in southern China, the following obstacles or difficulties exist in remote sensing land surface classification: 1) Diverse surface composition types;2) Undulating t...Because of the developed economy and lush vegetation in southern China, the following obstacles or difficulties exist in remote sensing land surface classification: 1) Diverse surface composition types;2) Undulating terrains;3) Small fragmented land;4) Indistinguishable shadows of surface objects. It is our top priority to clarify how to use the concept of big data (Data mining technology) and various new technologies and methods to make complex surface remote sensing information extraction technology develop in the direction of automation, refinement and intelligence. In order to achieve the above research objectives, the paper takes the Gaofen-2 satellite data produced in China as the data source, and takes the complex surface remote sensing information extraction technology as the research object, and intelligently analyzes the remote sensing information of complex surface on the basis of completing the data collection and preprocessing. The specific extraction methods are as follows: 1) extraction research on fractal texture features of Brownian motion;2) extraction research on color features;3) extraction research on vegetation index;4) research on vectors and corresponding classification. In this paper, fractal texture features, color features, vegetation features and spectral features of remote sensing images are combined to form a combination feature vector, which improves the dimension of features, and the feature vector improves the difference of remote sensing features, and it is more conducive to the classification of remote sensing features, and thus it improves the classification accuracy of remote sensing images. It is suitable for remote sensing information extraction of complex surface in southern China. This method can be extended to complex surface area in the future.展开更多
As Natural Language Processing(NLP)continues to advance,driven by the emergence of sophisticated large language models such as ChatGPT,there has been a notable growth in research activity.This rapid uptake reflects in...As Natural Language Processing(NLP)continues to advance,driven by the emergence of sophisticated large language models such as ChatGPT,there has been a notable growth in research activity.This rapid uptake reflects increasing interest in the field and induces critical inquiries into ChatGPT’s applicability in the NLP domain.This review paper systematically investigates the role of ChatGPT in diverse NLP tasks,including information extraction,Name Entity Recognition(NER),event extraction,relation extraction,Part of Speech(PoS)tagging,text classification,sentiment analysis,emotion recognition and text annotation.The novelty of this work lies in its comprehensive analysis of the existing literature,addressing a critical gap in understanding ChatGPT’s adaptability,limitations,and optimal application.In this paper,we employed a systematic stepwise approach following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses(PRISMA)framework to direct our search process and seek relevant studies.Our review reveals ChatGPT’s significant potential in enhancing various NLP tasks.Its adaptability in information extraction tasks,sentiment analysis,and text classification showcases its ability to comprehend diverse contexts and extract meaningful details.Additionally,ChatGPT’s flexibility in annotation tasks reducesmanual efforts and accelerates the annotation process,making it a valuable asset in NLP development and research.Furthermore,GPT-4 and prompt engineering emerge as a complementary mechanism,empowering users to guide the model and enhance overall accuracy.Despite its promising potential,challenges persist.The performance of ChatGP Tneeds tobe testedusingmore extensivedatasets anddiversedata structures.Subsequently,its limitations in handling domain-specific language and the need for fine-tuning in specific applications highlight the importance of further investigations to address these issues.展开更多
Purpose:The purpose of this study is to serve as a comprehensive review of the existing annotated corpora.This review study aims to provide information on the existing annotated corpora for event extraction,which are ...Purpose:The purpose of this study is to serve as a comprehensive review of the existing annotated corpora.This review study aims to provide information on the existing annotated corpora for event extraction,which are limited but essential for training and improving the existing event extraction algorithms.In addition to the primary goal of this study,it provides guidelines for preparing an annotated corpus and suggests suitable tools for the annotation task.Design/methodology/approach:This study employs an analytical approach to examine available corpus that is suitable for event extraction tasks.It offers an in-depth analysis of existing event extraction corpora and provides systematic guidelines for researchers to develop accurate,high-quality corpora.This ensures the reliability of the created corpus and its suitability for training machine learning algorithms.Findings:Our exploration reveals a scarcity of annotated corpora for event extraction tasks.In particular,the English corpora are mainly focused on the biomedical and general domains.Despite the issue of annotated corpora scarcity,there are several high-quality corpora available and widely used as benchmark datasets.However,access to some of these corpora might be limited owing to closed-access policies or discontinued maintenance after being initially released,rendering them inaccessible owing to broken links.Therefore,this study documents the available corpora for event extraction tasks.Research limitations:Our study focuses only on well-known corpora available in English and Chinese.Nevertheless,this study places a strong emphasis on the English corpora due to its status as a global lingua franca,making it widely understood compared to other languages.Practical implications:We genuinely believe that this study provides valuable knowledge that can serve as a guiding framework for preparing and accurately annotating events from text corpora.It provides comprehensive guidelines for researchers to improve the quality of corpus annotations,especially for event extraction tasks across various domains.Originality/value:This study comprehensively compiled information on the existing annotated corpora for event extraction tasks and provided preparation guidelines.展开更多
The joint entity relation extraction model which integrates the semantic information of relation is favored by relevant researchers because of its effectiveness in solving the overlapping of entities,and the method of...The joint entity relation extraction model which integrates the semantic information of relation is favored by relevant researchers because of its effectiveness in solving the overlapping of entities,and the method of defining the semantic template of relation manually is particularly prominent in the extraction effect because it can obtain the deep semantic information of relation.However,this method has some problems,such as relying on expert experience and poor portability.Inspired by the rule-based entity relation extraction method,this paper proposes a joint entity relation extraction model based on a relation semantic template automatically constructed,which is abbreviated as RSTAC.This model refines the extraction rules of relation semantic templates from relation corpus through dependency parsing and realizes the automatic construction of relation semantic templates.Based on the relation semantic template,the process of relation classification and triplet extraction is constrained,and finally,the entity relation triplet is obtained.The experimental results on the three major Chinese datasets of DuIE,SanWen,and FinRE showthat the RSTAC model successfully obtains rich deep semantics of relation,improves the extraction effect of entity relation triples,and the F1 scores are increased by an average of 0.96% compared with classical joint extraction models such as CasRel,TPLinker,and RFBFN.展开更多
Named entity recognition(NER)is a fundamental task of information extraction(IE),and it has attracted considerable research attention in recent years.The abundant annotated English NER datasets have significantly prom...Named entity recognition(NER)is a fundamental task of information extraction(IE),and it has attracted considerable research attention in recent years.The abundant annotated English NER datasets have significantly promoted the NER research in the English field.By contrast,much fewer efforts are made to the Chinese NER research,especially in the scientific domain,due to the scarcity of Chinese NER datasets.To alleviate this problem,we present aChinese scientificNER dataset–SciCN,which contains entity annotations of titles and abstracts derived from 3,500 scientific papers.We manually annotate a total of 62,059 entities,and these entities are classified into six types.Compared to English scientific NER datasets,SciCN has a larger scale and is more diverse,for it not only contains more paper abstracts but these abstracts are derived from more research fields.To investigate the properties of SciCN and provide baselines for future research,we adapt a number of previous state-of-theart Chinese NER models to evaluate SciCN.Experimental results show that SciCN is more challenging than other Chinese NER datasets.In addition,previous studies have proven the effectiveness of using lexicons to enhance Chinese NER models.Motivated by this fact,we provide a scientific domain-specific lexicon.Validation results demonstrate that our lexicon delivers better performance gains than lexicons of other domains.We hope that the SciCN dataset and the lexicon will enable us to benchmark the NER task regarding the Chinese scientific domain and make progress for future research.The dataset and lexicon are available at:https://github.com/yangjingla/SciCN.git.展开更多
The development of precision agriculture demands high accuracy and efficiency of cultivated land information extraction. As a new means of monitoring the ground in recent years, unmanned aerial vehicle (UAV) low-hei...The development of precision agriculture demands high accuracy and efficiency of cultivated land information extraction. As a new means of monitoring the ground in recent years, unmanned aerial vehicle (UAV) low-height remote sensing technique, which is flexible, efficient with low cost and with high resolution, is widely applied to investing various resources. Based on this, a novel extraction method for cultivated land information based on Deep Convolutional Neural Network and Transfer Learning (DTCLE) was proposed. First, linear features (roads and ridges etc.) were excluded based on Deep Convolutional Neural Network (DCNN). Next, feature extraction method learned from DCNN was used to cultivated land information extraction by introducing transfer learning mechanism. Last, cultivated land information extraction results were completed by the DTCLE and eCognifion for cultivated land information extraction (ECLE). The location of the Pengzhou County and Guanghan County, Sichuan Province were selected for the experimental purpose. The experimental results showed that the overall precision for the experimental image 1, 2 and 3 (of extracting cultivated land) with the DTCLE method was 91.7%, 88.1% and 88.2% respectively, and the overall precision of ECLE is 9o.7%, 90.5% and 87.0%, respectively. Accuracy of DTCLE was equivalent to that of ECLE, and also outperformed ECLE in terms of integrity and continuity.展开更多
Web information extraction is viewed as a classification process and a competing classification method is presented to extract Web information directly through classification. Web fragments are represented with three ...Web information extraction is viewed as a classification process and a competing classification method is presented to extract Web information directly through classification. Web fragments are represented with three general features and the similarities between fragments are then defined on the bases of these features. Through competitions of fragments for different slots in information templates, the method classifies fragments into slot classes and filters out noise information. Far less annotated samples are needed as compared with rule-based methods and therefore it has a strong portability. Experiments show that the method has good performance and is superior to DOM-based method in information extraction. Key words information extraction - competing classification - feature extraction - wrapper induction CLC number TP 311 Foundation item: Supported by the National Natural Science Foundation of China (60303024)Biography: LI Xiang-yang (1974-), male, Ph. D. Candidate, research direction: information extraction, natural language processing.展开更多
Information extraction plays a vital role in natural language processing,to extract named entities and events from unstructured data.Due to the exponential data growth in the agricultural sector,extracting significant...Information extraction plays a vital role in natural language processing,to extract named entities and events from unstructured data.Due to the exponential data growth in the agricultural sector,extracting significant information has become a challenging task.Though existing deep learningbased techniques have been applied in smart agriculture for crop cultivation,crop disease detection,weed removal,and yield production,still it is difficult to find the semantics between extracted information due to unswerving effects of weather,soil,pest,and fertilizer data.This paper consists of two parts.An initial phase,which proposes a data preprocessing technique for removal of ambiguity in input corpora,and the second phase proposes a novel deep learning-based long short-term memory with rectification in Adam optimizer andmultilayer perceptron to find agricultural-based named entity recognition,events,and relations between them.The proposed algorithm has been trained and tested on four input corpora i.e.,agriculture,weather,soil,and pest&fertilizers.The experimental results have been compared with existing techniques and itwas observed that the proposed algorithm outperformsWeighted-SOM,LSTM+RAO,PLR-DBN,KNN,and Na飗e Bayes on standard parameters like accuracy,sensitivity,and specificity.展开更多
Two phenomena of similar objects with different spectra and different objects with similar spectrum often result in the difficulty of separation and identification of all types of geographical objects only using spect...Two phenomena of similar objects with different spectra and different objects with similar spectrum often result in the difficulty of separation and identification of all types of geographical objects only using spectral information. Therefore, there is a need to incorporate spatial structural and spatial association properties of the surfaces of objects into image processing to improve the accuracy of classification of remotely sensed imagery. In the current article, a new method is proposed on the basis of the principle of multiple-point statistics for combining spectral information and spatial information for image classification. The method was validated by applying to a case study on road extraction based on Landsat TM taken over the Chinese Yellow River delta on August 8, 1999. The classification results have shown that this new method provides overall better results than the traditional methods such as maximum likelihood classifier (MLC).展开更多
The study of induced polarization (IP) information extraction from magnetotelluric (MT) sounding data is of great and practical significance to the exploitation of deep mineral, oil and gas resources. The linear i...The study of induced polarization (IP) information extraction from magnetotelluric (MT) sounding data is of great and practical significance to the exploitation of deep mineral, oil and gas resources. The linear inversion method, which has been given priority in previous research on the IP information extraction method, has three main problems as follows: 1) dependency on the initial model, 2) easily falling into the local minimum, and 3) serious non-uniqueness of solutions. Taking the nonlinearity and nonconvexity of IP information extraction into consideration, a two-stage CO-PSO minimum structure inversion method using compute unified distributed architecture (CUDA) is proposed. On one hand, a novel Cauchy oscillation particle swarm optimization (CO-PSO) algorithm is applied to extract nonlinear IP information from MT sounding data, which is implemented as a parallel algorithm within CUDA computing architecture; on the other hand, the impact of the polarizability on the observation data is strengthened by introducing a second stage inversion process, and the regularization parameter is applied in the fitness function of PSO algorithm to solve the problem of multi-solution in inversion. The inversion simulation results of polarization layers in different strata of various geoelectric models show that the smooth models of resistivity and IP parameters can be obtained by the proposed algorithm, the results of which are relatively stable and accurate. The experiment results added with noise indicate that this method is robust to Gaussian white noise. Compared with the traditional PSO and GA algorithm, the proposed algorithm has more efficiency and better inversion results.展开更多
Synthetic aperture radar (SAR) provides a large amount of image data for the observation and research of oceanic eddies. The use of SAR images to automatically depict the shape of eddies and extract the eddy informa...Synthetic aperture radar (SAR) provides a large amount of image data for the observation and research of oceanic eddies. The use of SAR images to automatically depict the shape of eddies and extract the eddy information is of great significance to the study of the oceanic eddies and the application of SAR eddy images. In this paper, a method of automatic shape depiction and information extraction for oceanic eddies in SAR images is proposed, which is for the research of spiral eddies. Firstly, the skeleton image is got by the skeletonization of SAR image. Secondly, the logarithmic spirals detected in the skeleton image are drawn on the SAR image to depict the shape of oceanic eddies. Finally, the eddy information is extracted based on the results of shape depiction. The sentinel 1 SAR eddy images in the Black Sea area were used for the experiment in this paper. The experimental results show that the proposed method can automatically depict the shape of eddies and extract the eddy information. The shape depiction results are consistent with the actual shape of the eddies, and the extracted eddy information is consistent with the reference information extracted by manual operation. As a result, the validity of the method is verified.展开更多
Due to the need of rapid and sustainable development in China’s coastal zones, the high-resolution information theory using data mining technology becomes an urgent research focus. However, the traditional pixel-base...Due to the need of rapid and sustainable development in China’s coastal zones, the high-resolution information theory using data mining technology becomes an urgent research focus. However, the traditional pixel-based image analysis methods cannot meet the needs of this development trend. The paper attempts to present an information extraction approach in terms of image segmentation based on an object-oriented algorithm for high-resolution remote sensing images. An aim of the author’ research is to establish an identification system of "pixel-primitive-object". Through extraction and combination of micro-scale coastal zone features, some objects are classified or recognized, e.g., tidal flat, water line, sea wall, and mariculture pond. Firstly, the authors extract various internal features of relatively homogeneous primitive objects using an image segmentation algorithm based on both spectral and shape information. Secondly, the features of those primitives are analyzed to ascertain an optimal object by adopting certain feature rules. The results from this research indicate that our model is practical to realize and the extraction accuracy of the coastal information is significantly improved as compared with the traditional approaches. Therefore, this study provides a potential way to serve the author’ highly dynamic coastal zones for monitoring, management, development and utilization.展开更多
In order to explore how to extract more transport information from current fluctuation, a theoretical extraction scheme is presented in a single barrier structure based on exclusion models, which include counter-flows...In order to explore how to extract more transport information from current fluctuation, a theoretical extraction scheme is presented in a single barrier structure based on exclusion models, which include counter-flows model and tunnel model. The first four cumulants of these two exclusion models are computed in a single barrier structure, and their characteristics are obtained. A scheme with the help of the first three cumulants is devised to check a transport process to follow the counter-flows model, the tunnel model or neither of them. Time series generated by Monte Carlo techniques is adopted to validate the abstraction procedure, and the result is reasonable.展开更多
Due to higher demands on product diversity,flexible shift between productions of different products in one equipment becomes a popular solution,resulting in existence of multiple operation modes in a single process.In...Due to higher demands on product diversity,flexible shift between productions of different products in one equipment becomes a popular solution,resulting in existence of multiple operation modes in a single process.In order to handle such multi-mode process,a novel double-layer structure is proposed and the original data are decomposed into common and specific characteristics according to the relationship between variables among each mode.In addition,both low and high order information are considered in each layer.The common and specific information within each mode can be captured and separated into several subspaces according to the different order information.The performance of the proposed method is further validated through a numerical example and the Tennessee Eastman(TE)benchmark.Compared with previous methods,superiority of the proposed method is validated by the better monitoring results.展开更多
Key information extraction can reduce the dimensional effects while evaluating the correct preferences of users during semantic data analysis.Currently,the classifiers are used to maximize the performance of web-page ...Key information extraction can reduce the dimensional effects while evaluating the correct preferences of users during semantic data analysis.Currently,the classifiers are used to maximize the performance of web-page recommendation in terms of precision and satisfaction.The recent method disambiguates contextual sentiment using conceptual prediction with robustness,however the conceptual prediction method is not able to yield the optimal solution.Context-dependent terms are primarily evaluated by constructing linear space of context features,presuming that if the terms come together in certain consumerrelated reviews,they are semantically reliant.Moreover,the more frequently they coexist,the greater the semantic dependency is.However,the influence of the terms that coexist with each other can be part of the frequency of the terms of their semantic dependence,as they are non-integrative and their individual meaning cannot be derived.In this work,we consider the strength of a term and the influence of a term as a combinatorial optimization,called Combinatorial Optimized Linear Space Knapsack for Information Retrieval(COLSK-IR).The COLSK-IR is considered as a knapsack problem with the total weight being the“term influence”or“influence of term”and the total value being the“term frequency”or“frequency of term”for semantic data analysis.The method,by which the term influence and the term frequency are considered to identify the optimal solutions,is called combinatorial optimizations.Thus,we choose the knapsack for performing an integer programming problem and perform multiple experiments using the linear space through combinatorial optimization to identify the possible optimum solutions.It is evident from our experimental results that the COLSK-IR provides better results than previous methods to detect strongly dependent snippets with minimum ambiguity that are related to inter-sentential context during semantic data analysis.展开更多
Information extraction techniques on the Web are the current research hotspot. Now many information extraction techniques based on different principles have appeared and have different capabilities. We classify the ex...Information extraction techniques on the Web are the current research hotspot. Now many information extraction techniques based on different principles have appeared and have different capabilities. We classify the existing information extraction techniques by the principle of information extraction and analyze the methods and principles of semantic information adding, schema defining, rule expression, semantic items locating and object locating in the approaches. Based on the above survey and analysis, several open problems are discussed.展开更多
A two-step information extraction method is presented to capture the specific index-related information more accurately.In the first step,the overall process variables are separated into two sets based on Pearson corr...A two-step information extraction method is presented to capture the specific index-related information more accurately.In the first step,the overall process variables are separated into two sets based on Pearson correlation coefficient.One is process variables strongly related to the specific index and the other is process variables weakly related to the specific index.Through performing principal component analysis(PCA)on the two sets,the directions of latent variables have changed.In other words,the correlation between latent variables in the set with strong correlation and the specific index may become weaker.Meanwhile,the correlation between latent variables in the set with weak correlation and the specific index may be enhanced.In the second step,the two sets are further divided into a subset strongly related to the specific index and a subset weakly related to the specific index from the perspective of latent variables using Pearson correlation coefficient,respectively.Two subsets strongly related to the specific index form a new subspace related to the specific index.Then,a hybrid monitoring strategy based on predicted specific index using partial least squares(PLS)and T2statistics-based method is proposed for specific index-related process monitoring using comprehensive information.Predicted specific index reflects real-time information for the specific index.T2statistics are used to monitor specific index-related information.Finally,the proposed method is applied to Tennessee Eastman(TE).The results indicate the effectiveness of the proposed method.展开更多
Traditional pattern representation in information extraction lack in the ability of representing domain-specific concepts and are therefore devoid of flexibility. To overcome these restrictions, an enhanced pattern re...Traditional pattern representation in information extraction lack in the ability of representing domain-specific concepts and are therefore devoid of flexibility. To overcome these restrictions, an enhanced pattern representation is designed which includes ontological concepts, neighboring-tree structures and soft constraints. An information-(extraction) inference engine based on hypothesis-generation and conflict-resolution is implemented. The proposed technique is successfully applied to an information extraction system for Chinese-language query front-end of a job-recruitment search engine.展开更多
Satellite remote sensing data are usually used to analyze the spatial distribution pattern of geological structures and generally serve as a significant means for the identification of alteration zones. Based on the L...Satellite remote sensing data are usually used to analyze the spatial distribution pattern of geological structures and generally serve as a significant means for the identification of alteration zones. Based on the Landsat Enhanced Thematic Mapper (ETM+) data, which have better spectral resolution (8 bands) and spatial resolution (15 m in PAN band), the synthesis processing techniques were presented to fulfill alteration information extraction: data preparation, vegetation indices and band ratios, and expert classifier-based classification. These techniques have been implemented in the MapGIS-RSP software (version 1.0), developed by the Wuhan Zondy Cyber Technology Co., Ltd, China. In the study area application of extracting alteration information in the Zhaoyuan (招远) gold mines, Shandong (山东) Province, China, several hydorthermally altered zones (included two new sites) were found after satellite imagery interpretation coupled with field surveys. It is concluded that these synthesis processing techniques are useful approaches and are applicable to a wide range of gold-mineralized alteration information extraction.展开更多
基金the National Research Foundation of Korea(NRF)grant funded by the Korea government(MSIT)(NRF-2019R1G1A1003312)the Ministry of Education(NRF-2021R1I1A3052815).
文摘Analyzing Research and Development(R&D)trends is important because it can influence future decisions regarding R&D direction.In typical trend analysis,topic or technology taxonomies are employed to compute the popularities of the topics or codes over time.Although it is simple and effective,the taxonomies are difficult to manage because new technologies are introduced rapidly.Therefore,recent studies exploit deep learning to extract pre-defined targets such as problems and solutions.Based on the recent advances in question answering(QA)using deep learning,we adopt a multi-turn QA model to extract problems and solutions from Korean R&D reports.With the previous research,we use the reports directly and analyze the difficulties in handling them using QA style on Information Extraction(IE)for sentence-level benchmark dataset.After investigating the characteristics of Korean R&D,we propose a model to deal with multiple and repeated appearances of targets in the reports.Accordingly,we propose a model that includes an algorithm with two novel modules and a prompt.A newly proposed methodology focuses on reformulating a question without a static template or pre-defined knowledge.We show the effectiveness of the proposed model using a Korean R&D report dataset that we constructed and presented an in-depth analysis of the benefits of the multi-turn QA model.
文摘Because of the developed economy and lush vegetation in southern China, the following obstacles or difficulties exist in remote sensing land surface classification: 1) Diverse surface composition types;2) Undulating terrains;3) Small fragmented land;4) Indistinguishable shadows of surface objects. It is our top priority to clarify how to use the concept of big data (Data mining technology) and various new technologies and methods to make complex surface remote sensing information extraction technology develop in the direction of automation, refinement and intelligence. In order to achieve the above research objectives, the paper takes the Gaofen-2 satellite data produced in China as the data source, and takes the complex surface remote sensing information extraction technology as the research object, and intelligently analyzes the remote sensing information of complex surface on the basis of completing the data collection and preprocessing. The specific extraction methods are as follows: 1) extraction research on fractal texture features of Brownian motion;2) extraction research on color features;3) extraction research on vegetation index;4) research on vectors and corresponding classification. In this paper, fractal texture features, color features, vegetation features and spectral features of remote sensing images are combined to form a combination feature vector, which improves the dimension of features, and the feature vector improves the difference of remote sensing features, and it is more conducive to the classification of remote sensing features, and thus it improves the classification accuracy of remote sensing images. It is suitable for remote sensing information extraction of complex surface in southern China. This method can be extended to complex surface area in the future.
文摘As Natural Language Processing(NLP)continues to advance,driven by the emergence of sophisticated large language models such as ChatGPT,there has been a notable growth in research activity.This rapid uptake reflects increasing interest in the field and induces critical inquiries into ChatGPT’s applicability in the NLP domain.This review paper systematically investigates the role of ChatGPT in diverse NLP tasks,including information extraction,Name Entity Recognition(NER),event extraction,relation extraction,Part of Speech(PoS)tagging,text classification,sentiment analysis,emotion recognition and text annotation.The novelty of this work lies in its comprehensive analysis of the existing literature,addressing a critical gap in understanding ChatGPT’s adaptability,limitations,and optimal application.In this paper,we employed a systematic stepwise approach following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses(PRISMA)framework to direct our search process and seek relevant studies.Our review reveals ChatGPT’s significant potential in enhancing various NLP tasks.Its adaptability in information extraction tasks,sentiment analysis,and text classification showcases its ability to comprehend diverse contexts and extract meaningful details.Additionally,ChatGPT’s flexibility in annotation tasks reducesmanual efforts and accelerates the annotation process,making it a valuable asset in NLP development and research.Furthermore,GPT-4 and prompt engineering emerge as a complementary mechanism,empowering users to guide the model and enhance overall accuracy.Despite its promising potential,challenges persist.The performance of ChatGP Tneeds tobe testedusingmore extensivedatasets anddiversedata structures.Subsequently,its limitations in handling domain-specific language and the need for fine-tuning in specific applications highlight the importance of further investigations to address these issues.
文摘Purpose:The purpose of this study is to serve as a comprehensive review of the existing annotated corpora.This review study aims to provide information on the existing annotated corpora for event extraction,which are limited but essential for training and improving the existing event extraction algorithms.In addition to the primary goal of this study,it provides guidelines for preparing an annotated corpus and suggests suitable tools for the annotation task.Design/methodology/approach:This study employs an analytical approach to examine available corpus that is suitable for event extraction tasks.It offers an in-depth analysis of existing event extraction corpora and provides systematic guidelines for researchers to develop accurate,high-quality corpora.This ensures the reliability of the created corpus and its suitability for training machine learning algorithms.Findings:Our exploration reveals a scarcity of annotated corpora for event extraction tasks.In particular,the English corpora are mainly focused on the biomedical and general domains.Despite the issue of annotated corpora scarcity,there are several high-quality corpora available and widely used as benchmark datasets.However,access to some of these corpora might be limited owing to closed-access policies or discontinued maintenance after being initially released,rendering them inaccessible owing to broken links.Therefore,this study documents the available corpora for event extraction tasks.Research limitations:Our study focuses only on well-known corpora available in English and Chinese.Nevertheless,this study places a strong emphasis on the English corpora due to its status as a global lingua franca,making it widely understood compared to other languages.Practical implications:We genuinely believe that this study provides valuable knowledge that can serve as a guiding framework for preparing and accurately annotating events from text corpora.It provides comprehensive guidelines for researchers to improve the quality of corpus annotations,especially for event extraction tasks across various domains.Originality/value:This study comprehensively compiled information on the existing annotated corpora for event extraction tasks and provided preparation guidelines.
基金supported by the National Natural Science Foundation of China(Nos.U1804263,U1736214,62172435)the Zhongyuan Science and Technology Innovation Leading Talent Project(No.214200510019).
文摘The joint entity relation extraction model which integrates the semantic information of relation is favored by relevant researchers because of its effectiveness in solving the overlapping of entities,and the method of defining the semantic template of relation manually is particularly prominent in the extraction effect because it can obtain the deep semantic information of relation.However,this method has some problems,such as relying on expert experience and poor portability.Inspired by the rule-based entity relation extraction method,this paper proposes a joint entity relation extraction model based on a relation semantic template automatically constructed,which is abbreviated as RSTAC.This model refines the extraction rules of relation semantic templates from relation corpus through dependency parsing and realizes the automatic construction of relation semantic templates.Based on the relation semantic template,the process of relation classification and triplet extraction is constrained,and finally,the entity relation triplet is obtained.The experimental results on the three major Chinese datasets of DuIE,SanWen,and FinRE showthat the RSTAC model successfully obtains rich deep semantics of relation,improves the extraction effect of entity relation triples,and the F1 scores are increased by an average of 0.96% compared with classical joint extraction models such as CasRel,TPLinker,and RFBFN.
基金This research was supported by the National Key Research and Development Program[2020YFB1006302].
文摘Named entity recognition(NER)is a fundamental task of information extraction(IE),and it has attracted considerable research attention in recent years.The abundant annotated English NER datasets have significantly promoted the NER research in the English field.By contrast,much fewer efforts are made to the Chinese NER research,especially in the scientific domain,due to the scarcity of Chinese NER datasets.To alleviate this problem,we present aChinese scientificNER dataset–SciCN,which contains entity annotations of titles and abstracts derived from 3,500 scientific papers.We manually annotate a total of 62,059 entities,and these entities are classified into six types.Compared to English scientific NER datasets,SciCN has a larger scale and is more diverse,for it not only contains more paper abstracts but these abstracts are derived from more research fields.To investigate the properties of SciCN and provide baselines for future research,we adapt a number of previous state-of-theart Chinese NER models to evaluate SciCN.Experimental results show that SciCN is more challenging than other Chinese NER datasets.In addition,previous studies have proven the effectiveness of using lexicons to enhance Chinese NER models.Motivated by this fact,we provide a scientific domain-specific lexicon.Validation results demonstrate that our lexicon delivers better performance gains than lexicons of other domains.We hope that the SciCN dataset and the lexicon will enable us to benchmark the NER task regarding the Chinese scientific domain and make progress for future research.The dataset and lexicon are available at:https://github.com/yangjingla/SciCN.git.
基金supported by the Fundamental Research Funds for the Central Universities of China(Grant No.2013SCU11006)the Key Laboratory of Digital Mapping and Land Information Application of National Administration of Surveying,Mapping and Geoinformation of China(Grant NO.DM2014SC02)the Key Laboratory of Geospecial Information Technology,Ministry of Land and Resources of China(Grant NO.KLGSIT201504)
文摘The development of precision agriculture demands high accuracy and efficiency of cultivated land information extraction. As a new means of monitoring the ground in recent years, unmanned aerial vehicle (UAV) low-height remote sensing technique, which is flexible, efficient with low cost and with high resolution, is widely applied to investing various resources. Based on this, a novel extraction method for cultivated land information based on Deep Convolutional Neural Network and Transfer Learning (DTCLE) was proposed. First, linear features (roads and ridges etc.) were excluded based on Deep Convolutional Neural Network (DCNN). Next, feature extraction method learned from DCNN was used to cultivated land information extraction by introducing transfer learning mechanism. Last, cultivated land information extraction results were completed by the DTCLE and eCognifion for cultivated land information extraction (ECLE). The location of the Pengzhou County and Guanghan County, Sichuan Province were selected for the experimental purpose. The experimental results showed that the overall precision for the experimental image 1, 2 and 3 (of extracting cultivated land) with the DTCLE method was 91.7%, 88.1% and 88.2% respectively, and the overall precision of ECLE is 9o.7%, 90.5% and 87.0%, respectively. Accuracy of DTCLE was equivalent to that of ECLE, and also outperformed ECLE in terms of integrity and continuity.
文摘Web information extraction is viewed as a classification process and a competing classification method is presented to extract Web information directly through classification. Web fragments are represented with three general features and the similarities between fragments are then defined on the bases of these features. Through competitions of fragments for different slots in information templates, the method classifies fragments into slot classes and filters out noise information. Far less annotated samples are needed as compared with rule-based methods and therefore it has a strong portability. Experiments show that the method has good performance and is superior to DOM-based method in information extraction. Key words information extraction - competing classification - feature extraction - wrapper induction CLC number TP 311 Foundation item: Supported by the National Natural Science Foundation of China (60303024)Biography: LI Xiang-yang (1974-), male, Ph. D. Candidate, research direction: information extraction, natural language processing.
基金This work was supported by the Deanship of Scientific Research at King Khalid University through a General Research Project under Grant Number GRP/41/42.
文摘Information extraction plays a vital role in natural language processing,to extract named entities and events from unstructured data.Due to the exponential data growth in the agricultural sector,extracting significant information has become a challenging task.Though existing deep learningbased techniques have been applied in smart agriculture for crop cultivation,crop disease detection,weed removal,and yield production,still it is difficult to find the semantics between extracted information due to unswerving effects of weather,soil,pest,and fertilizer data.This paper consists of two parts.An initial phase,which proposes a data preprocessing technique for removal of ambiguity in input corpora,and the second phase proposes a novel deep learning-based long short-term memory with rectification in Adam optimizer andmultilayer perceptron to find agricultural-based named entity recognition,events,and relations between them.The proposed algorithm has been trained and tested on four input corpora i.e.,agriculture,weather,soil,and pest&fertilizers.The experimental results have been compared with existing techniques and itwas observed that the proposed algorithm outperformsWeighted-SOM,LSTM+RAO,PLR-DBN,KNN,and Na飗e Bayes on standard parameters like accuracy,sensitivity,and specificity.
基金supported by the National Natural Science Foundation of China (No. 40671136)the National High Technology Research and Development Program of China (Nos.2006AA06Z115, 2006AA120106)
文摘Two phenomena of similar objects with different spectra and different objects with similar spectrum often result in the difficulty of separation and identification of all types of geographical objects only using spectral information. Therefore, there is a need to incorporate spatial structural and spatial association properties of the surfaces of objects into image processing to improve the accuracy of classification of remotely sensed imagery. In the current article, a new method is proposed on the basis of the principle of multiple-point statistics for combining spectral information and spatial information for image classification. The method was validated by applying to a case study on road extraction based on Landsat TM taken over the Chinese Yellow River delta on August 8, 1999. The classification results have shown that this new method provides overall better results than the traditional methods such as maximum likelihood classifier (MLC).
基金Projects(41604117,41204054)supported by the National Natural Science Foundation of ChinaProjects(20110490149,2015M580700)supported by the Research Fund for the Doctoral Program of Higher Education,China+1 种基金Project(2015zzts064)supported by the Fundamental Research Funds for the Central Universities,ChinaProject(16B147)supported by the Scientific Research Fund of Hunan Provincial Education Department,China
文摘The study of induced polarization (IP) information extraction from magnetotelluric (MT) sounding data is of great and practical significance to the exploitation of deep mineral, oil and gas resources. The linear inversion method, which has been given priority in previous research on the IP information extraction method, has three main problems as follows: 1) dependency on the initial model, 2) easily falling into the local minimum, and 3) serious non-uniqueness of solutions. Taking the nonlinearity and nonconvexity of IP information extraction into consideration, a two-stage CO-PSO minimum structure inversion method using compute unified distributed architecture (CUDA) is proposed. On one hand, a novel Cauchy oscillation particle swarm optimization (CO-PSO) algorithm is applied to extract nonlinear IP information from MT sounding data, which is implemented as a parallel algorithm within CUDA computing architecture; on the other hand, the impact of the polarizability on the observation data is strengthened by introducing a second stage inversion process, and the regularization parameter is applied in the fitness function of PSO algorithm to solve the problem of multi-solution in inversion. The inversion simulation results of polarization layers in different strata of various geoelectric models show that the smooth models of resistivity and IP parameters can be obtained by the proposed algorithm, the results of which are relatively stable and accurate. The experiment results added with noise indicate that this method is robust to Gaussian white noise. Compared with the traditional PSO and GA algorithm, the proposed algorithm has more efficiency and better inversion results.
文摘Synthetic aperture radar (SAR) provides a large amount of image data for the observation and research of oceanic eddies. The use of SAR images to automatically depict the shape of eddies and extract the eddy information is of great significance to the study of the oceanic eddies and the application of SAR eddy images. In this paper, a method of automatic shape depiction and information extraction for oceanic eddies in SAR images is proposed, which is for the research of spiral eddies. Firstly, the skeleton image is got by the skeletonization of SAR image. Secondly, the logarithmic spirals detected in the skeleton image are drawn on the SAR image to depict the shape of oceanic eddies. Finally, the eddy information is extracted based on the results of shape depiction. The sentinel 1 SAR eddy images in the Black Sea area were used for the experiment in this paper. The experimental results show that the proposed method can automatically depict the shape of eddies and extract the eddy information. The shape depiction results are consistent with the actual shape of the eddies, and the extracted eddy information is consistent with the reference information extracted by manual operation. As a result, the validity of the method is verified.
基金The "973" Project of China under contract No 2006CB701305the "863" Project of China under contract No2009AA12Z148the National Natural Science Foundation of China under contract No 40971224
文摘Due to the need of rapid and sustainable development in China’s coastal zones, the high-resolution information theory using data mining technology becomes an urgent research focus. However, the traditional pixel-based image analysis methods cannot meet the needs of this development trend. The paper attempts to present an information extraction approach in terms of image segmentation based on an object-oriented algorithm for high-resolution remote sensing images. An aim of the author’ research is to establish an identification system of "pixel-primitive-object". Through extraction and combination of micro-scale coastal zone features, some objects are classified or recognized, e.g., tidal flat, water line, sea wall, and mariculture pond. Firstly, the authors extract various internal features of relatively homogeneous primitive objects using an image segmentation algorithm based on both spectral and shape information. Secondly, the features of those primitives are analyzed to ascertain an optimal object by adopting certain feature rules. The results from this research indicate that our model is practical to realize and the extraction accuracy of the coastal information is significantly improved as compared with the traditional approaches. Therefore, this study provides a potential way to serve the author’ highly dynamic coastal zones for monitoring, management, development and utilization.
基金Project supported by the National Natural Science Foundation of China (Grant No. 60676053)Applied Material in Xi’an Innovation Funds,China (Grant No. XA-AM-200603)
文摘In order to explore how to extract more transport information from current fluctuation, a theoretical extraction scheme is presented in a single barrier structure based on exclusion models, which include counter-flows model and tunnel model. The first four cumulants of these two exclusion models are computed in a single barrier structure, and their characteristics are obtained. A scheme with the help of the first three cumulants is devised to check a transport process to follow the counter-flows model, the tunnel model or neither of them. Time series generated by Monte Carlo techniques is adopted to validate the abstraction procedure, and the result is reasonable.
基金the National Natural Science Foundation of China(61903352)China Postdoctoral Science Foundation(2020M671721)+4 种基金Zhejiang Province Natural Science Foundation of China(LQ19F030007)Natural Science Foundation of Jiangsu Province(BK20180594)Project of department of education of Zhejiang province(Y202044960)Project of Zhejiang Tongji Vocational College of Science and Technology(TRC1904)Foundation of Key Laboratory of Advanced Process Control for Light Industry(Jiangnan University),Ministry of Education,P.R.China,APCLI1803.
文摘Due to higher demands on product diversity,flexible shift between productions of different products in one equipment becomes a popular solution,resulting in existence of multiple operation modes in a single process.In order to handle such multi-mode process,a novel double-layer structure is proposed and the original data are decomposed into common and specific characteristics according to the relationship between variables among each mode.In addition,both low and high order information are considered in each layer.The common and specific information within each mode can be captured and separated into several subspaces according to the different order information.The performance of the proposed method is further validated through a numerical example and the Tennessee Eastman(TE)benchmark.Compared with previous methods,superiority of the proposed method is validated by the better monitoring results.
文摘Key information extraction can reduce the dimensional effects while evaluating the correct preferences of users during semantic data analysis.Currently,the classifiers are used to maximize the performance of web-page recommendation in terms of precision and satisfaction.The recent method disambiguates contextual sentiment using conceptual prediction with robustness,however the conceptual prediction method is not able to yield the optimal solution.Context-dependent terms are primarily evaluated by constructing linear space of context features,presuming that if the terms come together in certain consumerrelated reviews,they are semantically reliant.Moreover,the more frequently they coexist,the greater the semantic dependency is.However,the influence of the terms that coexist with each other can be part of the frequency of the terms of their semantic dependence,as they are non-integrative and their individual meaning cannot be derived.In this work,we consider the strength of a term and the influence of a term as a combinatorial optimization,called Combinatorial Optimized Linear Space Knapsack for Information Retrieval(COLSK-IR).The COLSK-IR is considered as a knapsack problem with the total weight being the“term influence”or“influence of term”and the total value being the“term frequency”or“frequency of term”for semantic data analysis.The method,by which the term influence and the term frequency are considered to identify the optimal solutions,is called combinatorial optimizations.Thus,we choose the knapsack for performing an integer programming problem and perform multiple experiments using the linear space through combinatorial optimization to identify the possible optimum solutions.It is evident from our experimental results that the COLSK-IR provides better results than previous methods to detect strongly dependent snippets with minimum ambiguity that are related to inter-sentential context during semantic data analysis.
文摘Information extraction techniques on the Web are the current research hotspot. Now many information extraction techniques based on different principles have appeared and have different capabilities. We classify the existing information extraction techniques by the principle of information extraction and analyze the methods and principles of semantic information adding, schema defining, rule expression, semantic items locating and object locating in the approaches. Based on the above survey and analysis, several open problems are discussed.
基金Projects(61374140,61673173)supported by the National Natural Science Foundation of ChinaProjects(222201717006,222201714031)supported by the Fundamental Research Funds for the Central Universities,China
文摘A two-step information extraction method is presented to capture the specific index-related information more accurately.In the first step,the overall process variables are separated into two sets based on Pearson correlation coefficient.One is process variables strongly related to the specific index and the other is process variables weakly related to the specific index.Through performing principal component analysis(PCA)on the two sets,the directions of latent variables have changed.In other words,the correlation between latent variables in the set with strong correlation and the specific index may become weaker.Meanwhile,the correlation between latent variables in the set with weak correlation and the specific index may be enhanced.In the second step,the two sets are further divided into a subset strongly related to the specific index and a subset weakly related to the specific index from the perspective of latent variables using Pearson correlation coefficient,respectively.Two subsets strongly related to the specific index form a new subspace related to the specific index.Then,a hybrid monitoring strategy based on predicted specific index using partial least squares(PLS)and T2statistics-based method is proposed for specific index-related process monitoring using comprehensive information.Predicted specific index reflects real-time information for the specific index.T2statistics are used to monitor specific index-related information.Finally,the proposed method is applied to Tennessee Eastman(TE).The results indicate the effectiveness of the proposed method.
文摘Traditional pattern representation in information extraction lack in the ability of representing domain-specific concepts and are therefore devoid of flexibility. To overcome these restrictions, an enhanced pattern representation is designed which includes ontological concepts, neighboring-tree structures and soft constraints. An information-(extraction) inference engine based on hypothesis-generation and conflict-resolution is implemented. The proposed technique is successfully applied to an information extraction system for Chinese-language query front-end of a job-recruitment search engine.
基金The paper is supported by the Research Foundation for Out-standing Young Teachers, China University of Geosciences (Wuhan) (Nos. CUGQNL0628, CUGQNL0640)the National High-Tech Research and Development Program (863 Program) (No. 2001AA135170)the Postdoctoral Foundation of the Shandong Zhaojin Group Co. (No. 20050262120)
文摘Satellite remote sensing data are usually used to analyze the spatial distribution pattern of geological structures and generally serve as a significant means for the identification of alteration zones. Based on the Landsat Enhanced Thematic Mapper (ETM+) data, which have better spectral resolution (8 bands) and spatial resolution (15 m in PAN band), the synthesis processing techniques were presented to fulfill alteration information extraction: data preparation, vegetation indices and band ratios, and expert classifier-based classification. These techniques have been implemented in the MapGIS-RSP software (version 1.0), developed by the Wuhan Zondy Cyber Technology Co., Ltd, China. In the study area application of extracting alteration information in the Zhaoyuan (招远) gold mines, Shandong (山东) Province, China, several hydorthermally altered zones (included two new sites) were found after satellite imagery interpretation coupled with field surveys. It is concluded that these synthesis processing techniques are useful approaches and are applicable to a wide range of gold-mineralized alteration information extraction.