This article proposes a new general, highly efficient algorithm for extracting domain terminologies. This domain-independent algorithm with multi-layers of filters is a hybrid of statistic-oriented and rule-oriented m...This article proposes a new general, highly efficient algorithm for extracting domain terminologies. This domain-independent algorithm with multi-layers of filters is a hybrid of statistic-oriented and rule-oriented methods. Utilizing the features of domain terminologies and the characteristics that are unique to Chinese, this algorithm extracts domain terminologies by generating multi-word unit (MWU) candidates at first and then fihering the candidates through multi-strategies. Our test resuhs show that this algorithm is feasible and effective.展开更多
Relative radiometric normalization (RRN) minimizes radiometric differences among images caused by inconsistencies of acquisition conditions rather than changes in surface. Scale invariant feature transform (SIFT) has ...Relative radiometric normalization (RRN) minimizes radiometric differences among images caused by inconsistencies of acquisition conditions rather than changes in surface. Scale invariant feature transform (SIFT) has the ability to automatically extract control points (CPs) and is commonly used for remote sensing images. However, its results are mostly inaccurate and sometimes contain incorrect matching caused by generating a small number of false CP pairs. These CP pairs have high false alarm matching. This paper presents a modified method to improve the performance of SIFT CPs matching by applying sum of absolute difference (SAD) in a different manner for the new optical satellite generation called near-equatorial orbit satellite and multi-sensor images. The proposed method, which has a significantly high rate of correct matches, improves CP matching. The data in this study were obtained from the RazakSAT satellite a new near equatorial satellite system. The proposed method involves six steps: 1) data reduction, 2) applying the SIFT to automatically extract CPs, 3) refining CPs matching by using SAD algorithm with empirical threshold, and 4) calculation of true CPs intensity values over all image’ bands, 5) preforming a linear regression model between the intensity values of CPs locate in reverence and sensed image’ bands, 6) Relative radiometric normalization conducting using regression transformation functions. Different thresholds have experimentally tested and used in conducting this study (50 and 70), by followed the proposed method, and it removed the false extracted SIFT CPs to be from 775, 1125, 883, 804, 883 and 681 false pairs to 342, 424, 547, 706, 547, and 469 corrected and matched pairs, respectively.展开更多
Taking TM images, SPOT photos and DEM images as the basic information, this paper had not only put forward a kind of manual controlled computer-automatic extraction method, but also completed the task of extracting th...Taking TM images, SPOT photos and DEM images as the basic information, this paper had not only put forward a kind of manual controlled computer-automatic extraction method, but also completed the task of extracting the main types of ground objects in the Three Gorges Reservoir area under relatively high accuracy, after finishing such preprocessing tasks as correcting the topographical spectrum and synthesizing the data. Taking the specialized image analysis software-eCognition as the platform, the research achieved the goal of classifying through choosing samples, picking out the best wave bands, and producing the identifying functions. At the same time the extraction process partly dispelled the influence of such phenomena as the same thing with different spectrums, different things with the same spectrum, border transitions, etc. The research did certain exploration in the aspect of technological route and method of using automatic extraction of the remote sensing image to obtain the information of land cover for the regions whose ground objects have complicated spectrums.展开更多
In the last two decades,significant research has been conducted in the field of automated extraction of rock mass discontinuity characteristics from three-dimensional(3D)models.This provides several methodologies for ...In the last two decades,significant research has been conducted in the field of automated extraction of rock mass discontinuity characteristics from three-dimensional(3D)models.This provides several methodologies for acquiring discontinuity measurements from 3D models,such as point clouds generated using laser scanning or photogrammetry.However,even with numerous automated and semiautomated methods presented in the literature,there is not one single method that can automatically characterize discontinuities accurately in a minimum of time.In this paper,we critically review all the existing methods proposed in the literature for the extraction of discontinuity characteristics such as joint sets and orientations,persistence,joint spacing,roughness and block size using point clouds,digital elevation maps,or meshes.As a result of this review,we identify the strengths and drawbacks of each method used for extracting those characteristics.We found that the approaches based on voxels and region growing are superior in extracting joint planes from 3D point clouds.Normal tensor voting with trace growth algorithm is a robust method for measuring joint trace length from 3D meshes.Spacing is estimated by calculating the perpendicular distance between joint planes.Several independent roughness indices are presented to quantify roughness from 3D surface models,but there is a need to incorporate these indices into automated methodologies.There is a lack of efficient algorithms for direct computation of block size from 3D rock mass surface models.展开更多
Purpose:Automatic keyphrase extraction(AKE)is an important task for grasping the main points of the text.In this paper,we aim to combine the benefits of sequence labeling formulation and pretrained language model to p...Purpose:Automatic keyphrase extraction(AKE)is an important task for grasping the main points of the text.In this paper,we aim to combine the benefits of sequence labeling formulation and pretrained language model to propose an automatic keyphrase extraction model for Chinese scientific research.Design/methodology/approach:We regard AKE from Chinese text as a character-level sequence labeling task to avoid segmentation errors of Chinese tokenizer and initialize our model with pretrained language model BERT,which was released by Google in 2018.We collect data from Chinese Science Citation Database and construct a large-scale dataset from medical domain,which contains 100,000 abstracts as training set,6,000 abstracts as development set and 3,094 abstracts as test set.We use unsupervised keyphrase extraction methods including term frequency(TF),TF-IDF,TextRank and supervised machine learning methods including Conditional Random Field(CRF),Bidirectional Long Short Term Memory Network(BiLSTM),and BiLSTM-CRF as baselines.Experiments are designed to compare word-level and character-level sequence labeling approaches on supervised machine learning models and BERT-based models.Findings:Compared with character-level BiLSTM-CRF,the best baseline model with F1 score of 50.16%,our character-level sequence labeling model based on BERT obtains F1 score of 59.80%,getting 9.64%absolute improvement.Research limitations:We just consider automatic keyphrase extraction task rather than keyphrase generation task,so only keyphrases that are occurred in the given text can be extracted.In addition,our proposed dataset is not suitable for dealing with nested keyphrases.Practical implications:We make our character-level IOB format dataset of Chinese Automatic Keyphrase Extraction from scientific Chinese medical abstracts(CAKE)publicly available for the benefits of research community,which is available at:https://github.com/possible1402/Dataset-For-Chinese-Medical-Keyphrase-Extraction.Originality/value:By designing comparative experiments,our study demonstrates that character-level formulation is more suitable for Chinese automatic keyphrase extraction task under the general trend of pretrained language models.And our proposed dataset provides a unified method for model evaluation and can promote the development of Chinese automatic keyphrase extraction to some extent.展开更多
Craters are salient terrain features on planetary surfaces, and provide useful information about the relative dating of geological unit of planets. In addition, they are ideal landmarks for spacecraft navigation. Due ...Craters are salient terrain features on planetary surfaces, and provide useful information about the relative dating of geological unit of planets. In addition, they are ideal landmarks for spacecraft navigation. Due to low contrast and uneven illumination, automatic extraction of craters remains a challenging task. This paper presents a saliency detection method for crater edges and a feature matching algorithm based on edges informa- tion. The craters are extracted through saliency edges detection, edge extraction and selection, feature matching of the same crater edges and robust ellipse fitting. In the edges matching algorithm, a crater feature model is proposed by analyzing the relationship between highlight region edges and shadow region ones. Then, crater edges are paired through the effective matching algorithm. Experiments of real planetary images show that the proposed approach is robust to different lights and topographies, and the detection rate is larger than 90%.展开更多
Purpose:The main objective of this work is to show the potentialities of recently developed approaches for automatic knowledge extraction directly from the universities’websites.The information automatically extracte...Purpose:The main objective of this work is to show the potentialities of recently developed approaches for automatic knowledge extraction directly from the universities’websites.The information automatically extracted can be potentially updated with a frequency higher than once per year,and be safe from manipulations or misinterpretations.Moreover,this approach allows us flexibility in collecting indicators about the efficiency of universities’websites and their effectiveness in disseminating key contents.These new indicators can complement traditional indicators of scientific research(e.g.number of articles and number of citations)and teaching(e.g.number of students and graduates)by introducing further dimensions to allow new insights for“profiling”the analyzed universities.Design/methodology/approach:Webometrics relies on web mining methods and techniques to perform quantitative analyses of the web.This study implements an advanced application of the webometric approach,exploiting all the three categories of web mining:web content mining;web structure mining;web usage mining.The information to compute our indicators has been extracted from the universities’websites by using web scraping and text mining techniques.The scraped information has been stored in a NoSQL DB according to a semistructured form to allow for retrieving information efficiently by text mining techniques.This provides increased flexibility in the design of new indicators,opening the door to new types of analyses.Some data have also been collected by means of batch interrogations of search engines(Bing,www.bing.com)or from a leading provider of Web analytics(SimilarWeb,http://www.similarweb.com).The information extracted from the Web has been combined with the University structural information taken from the European Tertiary Education Register(https://eter.joanneum.at/#/home),a database collecting information on Higher Education Institutions(HEIs)at European level.All the above was used to perform a clusterization of 79 Italian universities based on structural and digital indicators.Findings:The main findings of this study concern the evaluation of the potential in digitalization of universities,in particular by presenting techniques for the automatic extraction of information from the web to build indicators of quality and impact of universities’websites.These indicators can complement traditional indicators and can be used to identify groups of universities with common features using clustering techniques working with the above indicators.Research limitations:The results reported in this study refers to Italian universities only,but the approach could be extended to other university systems abroad.Practical implications:The approach proposed in this study and its illustration on Italian universities show the usefulness of recently introduced automatic data extraction and web scraping approaches and its practical relevance for characterizing and profiling the activities of universities on the basis of their websites.The approach could be applied to other university systems.Originality/value:This work applies for the first time to university websites some recently introduced techniques for automatic knowledge extraction based on web scraping,optical character recognition and nontrivial text mining operations(Bruni&Bianchi,2020).展开更多
The scale-invariant feature transform(SIFT)ability to automatic control points(CPs)extraction is very well known on remote sensing images,however,its result inaccurate and sometimes has incorrect matching from generat...The scale-invariant feature transform(SIFT)ability to automatic control points(CPs)extraction is very well known on remote sensing images,however,its result inaccurate and sometimes has incorrect matching from generating a small number of false CPs pairs,their matching has high false alarm.This paper presents a method containing a modification to improve the performance of the SIFT CPs matching by applying sum of absolute difference(SAD)in different manner for the new optical satellite generation called near-equatorial orbit satellite(NEqO)and multi-sensor images.The proposed method leads to improving CPs matching with a significantly higher rate of correct matches.The data in this study were obtained from the RazakSAT satellite covering the Kuala Lumpur-Pekan area.The proposed method consists of three parts:(1)applying the SIFT to extract CPs automatically,(2)refining CPs matching by SAD algorithm with empirical threshold,and(3)evaluating the refined CPs scenario by comparing the result of the original SIFT with that of the proposed method.The result indicates an accurate and precise performance of the model,which showed the effectiveness and robustness of the proposed approach.展开更多
There are several techniques that were developed for determining the linear features. Lineament extraction?from satellite data has been the most widely used applications in geology. In the present study, lineament has...There are several techniques that were developed for determining the linear features. Lineament extraction?from satellite data has been the most widely used applications in geology. In the present study, lineament has?been extracted from the digital satellite scene (Landsat 5, TM data), in the region of Zahret Median situated in the north west of Tunisia. The image was enhanced and used for automatic extraction. Several directions of features were mapped. The directions of major invoices are NE-SW and NW-SE oriented. The validation of the obtained results is carried out by comparison with the results geophysics as well as to the studies previous of mapping developed in the sector of study.展开更多
Digitizing road maps manually is an expensive and time-consuming task. Several methods that intend to develop fully or semi-automated systems have been proposed. In this work we introduce a method, based on the Radon ...Digitizing road maps manually is an expensive and time-consuming task. Several methods that intend to develop fully or semi-automated systems have been proposed. In this work we introduce a method, based on the Radon transform and optimal algorithms, which extracts automatically roads on images of rural areas, images that were acquired by digital cameras and airborne laser scanners. The proposed method detects linear segments iteratively and starting from this it generates the centerlines of the roads. The method is based on an objective function which depends on three parameters related to the correlation between the cross-sections, spectral similarity and directions of the segments. Different tests were performed using aerial photos, Ikonos images and laser scanner data of an area located in the state of Parana (Brazil) and their results are presented and discussed. The quality of the detection of the roads centerlines was computed using several indexes - completeness, correctness and RMS. The values obtained reveal the good performance of the proposed methodology.展开更多
<span style="font-family:Verdana;">The Near-equatorial orbit (NEqO) satellite represent</span><span style="font-family:Verdana;">s</span><span style="font-family:Ver...<span style="font-family:Verdana;">The Near-equatorial orbit (NEqO) satellite represent</span><span style="font-family:Verdana;">s</span><span style="font-family:Verdana;"> a new generation of optical satellite images characterized by nonlinear distortion when captured. Conventional modeling techniques are insufficient to overcome the geometric distortion in these satellite images. This study proposes a new methodology for overcom</span><span style="font-family:Verdana;">ing</span><span style="font-family:Verdana;"> the geometric distortion of the NEqO images. The data used are obtained from RazakSAT and SPOT-5 satellite images in Malaysia. The method starts with applying the RI-SIFT algorithm to extract control points (CPs) automatically. These CPs are used to solve for the transformation parameters of the geometric correction model by applying spline transformations. The result </span><span style="font-family:Verdana;">is </span><span style="font-family:Verdana;">verified through statistical comparison: 1) geometric correction on the RazakSAT image is performed with Spot satellite image with using first-order polynomial trans-formation. 2) Then calculate the root mean square error (RMSE)</span><span style="font-family:Verdana;">. </span><span style="font-family:;" "=""><span style="font-family:Verdana;">3) Compare the calculated RMSE with that obtained from the first step with that of the proposed method. The RMSE value of the geometric corrections using the proposed method was 7.08 × 10</span><sup><span style="font-family:Verdana;"><span style="white-space:nowrap;">−</span>9</span></sup><span style="font-family:Verdana;"> m. The proposed method provides promising results.</span></span>展开更多
Semantic lexical chains have been regarded as important in textural cohesion, although traditionally, the classification of these chains has been limited to repetition, synonymy, hyponymy, and collocates. The cases of...Semantic lexical chains have been regarded as important in textural cohesion, although traditionally, the classification of these chains has been limited to repetition, synonymy, hyponymy, and collocates. The cases of automatic extraction of lexical chains have found that the contextual synonyms can not be recognized, nor extracted automatically. This study took the data-based technology to extract the contextually co-occurring lexical chains through thematic lexical items. It found that these contextually co-occurring lexical chains can include the semantic lexical chains and contextual synonyms. It also found that, in extraction of collocates of the co-occurring lexical items, these collocates form secondary lexical chains, which contribute to textual cohesion. The vertical lexical chains made of contextually cooccurring lexical items and the horizontal chains made of collocational lexical items work together in making the text into a coherent whole.展开更多
The distribution characteristics of the impact craters can provide a large amount of information on impact history and the lunar evolution process. In this research, based on the digital elevation model (DEM) data o...The distribution characteristics of the impact craters can provide a large amount of information on impact history and the lunar evolution process. In this research, based on the digital elevation model (DEM) data originating from Change'E-1 CCD stereo camera, three automatic extraction methods for the impact craters are implemented in two research areas: direct extraction from flooded DEM data (the Flooded method), object-oriented extraction from DEM data by using ENVI ZOOM function (the Object-Oriented method) and novel object-oriented extraction from flooded DEM data (the Flooded Object-Oriented method). Accuracy assessment, extracted degree computation, cumulative frequency analysis, shape and age analysis of the extracted craters combined display the following results. (1) The Flooded Object-Oriented method yields better accuracy than the other two methods in the two research areas; the extraction result of the Flooded method offers the similar accuracy to that of the Object-Oriented method. (2) The cumulative frequency curves for the extracted craters and the confirmed craters share a simi- lar change trajectory. (3) The number of the impact craters extracted by the three methods in the Imbrian period is the largest and is of various types; as to their age earlier than lmbrain, it is difficult to extract because they could have been destroyed.展开更多
Terrestrial LiDAR data can be used to extract accurate structure parameters of corn plant and canopy,such as leaf area,leaf distribution,and 3D model.The first step of these applications is to extract corn leaf points...Terrestrial LiDAR data can be used to extract accurate structure parameters of corn plant and canopy,such as leaf area,leaf distribution,and 3D model.The first step of these applications is to extract corn leaf points from unorganized LiDAR point clouds.This paper focused on an automated extraction algorithm for identifying the points returning on corn leaf from massive,unorganized LiDAR point clouds.In order to mine the distinct geometry of corn leaves and stalk,the Difference of Normal(DoN)method was proposed to extract corn leaf points.Firstly,the normals of corn leaf surface for all points were estimated on multiple scales.Secondly,the directional ambiguity of the normals was eliminated to obtain the same normal direction for the same leaf distribution.Finally,the DoN was computed and the computed DoN results on the optimal scale were used to extract leave points.The quantitative accuracy assessment showed that the overall accuracy was 94.10%,commission error was 5.89%,and omission error was 18.65%.The results indicate that the proposed method is effective and the corn leaf points can be extracted automatically from massive,unorganized terrestrial LiDAR point clouds using the proposed DoN method.展开更多
A novel method is proposed to automatically extract foreground objects from Martian surface images.The characteristics of Mars images are distinct,e.g.uneven illumination,low contrast between foreground and background...A novel method is proposed to automatically extract foreground objects from Martian surface images.The characteristics of Mars images are distinct,e.g.uneven illumination,low contrast between foreground and background,much noise in the background,and foreground objects with irregular shapes.In the context of these characteristics,an image is divided into foreground objects and background information.Homomorphism filtering is first applied to rectify brightness.Then,wavelet transformation enhances contrast and denoises the image.Third,edge detection and active contour are combined to extract contours regardless of the shape of the image.Experimental results show that the method can extract foreground objects from Mars images automatically and accurately,and has many potential applications.展开更多
Forest data acquisition,which is of crucial importance for modeling global biogeochemical cycles and climate,makes a contribution to building the ecological Digital Earth(DE).Due to the complex calculations and large ...Forest data acquisition,which is of crucial importance for modeling global biogeochemical cycles and climate,makes a contribution to building the ecological Digital Earth(DE).Due to the complex calculations and large volumes of data associated with high-resolution images of large areas,accurate and effective extraction of individual tree crowns remains challenging.In this study,two GeoEye-1 panchromatic images of Beihai and Ningbo in China with areas of 5 and 25 km2,respectively,were used as experimental data to establish a novel method for the automatic extraction of individual tree crowns based on a self-adaptive mutual information(SMI)algorithm and tile computing technology(SMI-TCT).To evaluate the performance of the algorithm,four commonly used algorithms were also applied to extract the individual tree crowns.The overall accuracy of the proposed method for the two experimental areas was superior to that of the four other algorithms,with maximum extraction accuracies of 85.7%and 63.8%.Moreover,the results also indicated that the novel method was suitable for individual tree crowns extraction in sizeable areas because of the multithread parallel computing technology.展开更多
基金Supported by the National Natural Science Foundation of China(Grant No. 60496326)
文摘This article proposes a new general, highly efficient algorithm for extracting domain terminologies. This domain-independent algorithm with multi-layers of filters is a hybrid of statistic-oriented and rule-oriented methods. Utilizing the features of domain terminologies and the characteristics that are unique to Chinese, this algorithm extracts domain terminologies by generating multi-word unit (MWU) candidates at first and then fihering the candidates through multi-strategies. Our test resuhs show that this algorithm is feasible and effective.
文摘Relative radiometric normalization (RRN) minimizes radiometric differences among images caused by inconsistencies of acquisition conditions rather than changes in surface. Scale invariant feature transform (SIFT) has the ability to automatically extract control points (CPs) and is commonly used for remote sensing images. However, its results are mostly inaccurate and sometimes contain incorrect matching caused by generating a small number of false CP pairs. These CP pairs have high false alarm matching. This paper presents a modified method to improve the performance of SIFT CPs matching by applying sum of absolute difference (SAD) in a different manner for the new optical satellite generation called near-equatorial orbit satellite and multi-sensor images. The proposed method, which has a significantly high rate of correct matches, improves CP matching. The data in this study were obtained from the RazakSAT satellite a new near equatorial satellite system. The proposed method involves six steps: 1) data reduction, 2) applying the SIFT to automatically extract CPs, 3) refining CPs matching by using SAD algorithm with empirical threshold, and 4) calculation of true CPs intensity values over all image’ bands, 5) preforming a linear regression model between the intensity values of CPs locate in reverence and sensed image’ bands, 6) Relative radiometric normalization conducting using regression transformation functions. Different thresholds have experimentally tested and used in conducting this study (50 and 70), by followed the proposed method, and it removed the false extracted SIFT CPs to be from 775, 1125, 883, 804, 883 and 681 false pairs to 342, 424, 547, 706, 547, and 469 corrected and matched pairs, respectively.
基金Under the auspices of the Construction Committeeof Three GorgesR eservoirProject(No .SX [2002]00401) andChineseAcademy ofSciences(No .KZCX2-SW-319-01 )
文摘Taking TM images, SPOT photos and DEM images as the basic information, this paper had not only put forward a kind of manual controlled computer-automatic extraction method, but also completed the task of extracting the main types of ground objects in the Three Gorges Reservoir area under relatively high accuracy, after finishing such preprocessing tasks as correcting the topographical spectrum and synthesizing the data. Taking the specialized image analysis software-eCognition as the platform, the research achieved the goal of classifying through choosing samples, picking out the best wave bands, and producing the identifying functions. At the same time the extraction process partly dispelled the influence of such phenomena as the same thing with different spectrums, different things with the same spectrum, border transitions, etc. The research did certain exploration in the aspect of technological route and method of using automatic extraction of the remote sensing image to obtain the information of land cover for the regions whose ground objects have complicated spectrums.
基金funded by the U.S.National Institute for Occupational Safety and Health(NIOSH)under the Contract No.75D30119C06044。
文摘In the last two decades,significant research has been conducted in the field of automated extraction of rock mass discontinuity characteristics from three-dimensional(3D)models.This provides several methodologies for acquiring discontinuity measurements from 3D models,such as point clouds generated using laser scanning or photogrammetry.However,even with numerous automated and semiautomated methods presented in the literature,there is not one single method that can automatically characterize discontinuities accurately in a minimum of time.In this paper,we critically review all the existing methods proposed in the literature for the extraction of discontinuity characteristics such as joint sets and orientations,persistence,joint spacing,roughness and block size using point clouds,digital elevation maps,or meshes.As a result of this review,we identify the strengths and drawbacks of each method used for extracting those characteristics.We found that the approaches based on voxels and region growing are superior in extracting joint planes from 3D point clouds.Normal tensor voting with trace growth algorithm is a robust method for measuring joint trace length from 3D meshes.Spacing is estimated by calculating the perpendicular distance between joint planes.Several independent roughness indices are presented to quantify roughness from 3D surface models,but there is a need to incorporate these indices into automated methodologies.There is a lack of efficient algorithms for direct computation of block size from 3D rock mass surface models.
基金This work is supported by the project“Research on Methods and Technologies of Scientific Researcher Entity Linking and Subject Indexing”(Grant No.G190091)from the National Science Library,Chinese Academy of Sciencesthe project“Design and Research on a Next Generation of Open Knowledge Services System and Key Technologies”(2019XM55).
文摘Purpose:Automatic keyphrase extraction(AKE)is an important task for grasping the main points of the text.In this paper,we aim to combine the benefits of sequence labeling formulation and pretrained language model to propose an automatic keyphrase extraction model for Chinese scientific research.Design/methodology/approach:We regard AKE from Chinese text as a character-level sequence labeling task to avoid segmentation errors of Chinese tokenizer and initialize our model with pretrained language model BERT,which was released by Google in 2018.We collect data from Chinese Science Citation Database and construct a large-scale dataset from medical domain,which contains 100,000 abstracts as training set,6,000 abstracts as development set and 3,094 abstracts as test set.We use unsupervised keyphrase extraction methods including term frequency(TF),TF-IDF,TextRank and supervised machine learning methods including Conditional Random Field(CRF),Bidirectional Long Short Term Memory Network(BiLSTM),and BiLSTM-CRF as baselines.Experiments are designed to compare word-level and character-level sequence labeling approaches on supervised machine learning models and BERT-based models.Findings:Compared with character-level BiLSTM-CRF,the best baseline model with F1 score of 50.16%,our character-level sequence labeling model based on BERT obtains F1 score of 59.80%,getting 9.64%absolute improvement.Research limitations:We just consider automatic keyphrase extraction task rather than keyphrase generation task,so only keyphrases that are occurred in the given text can be extracted.In addition,our proposed dataset is not suitable for dealing with nested keyphrases.Practical implications:We make our character-level IOB format dataset of Chinese Automatic Keyphrase Extraction from scientific Chinese medical abstracts(CAKE)publicly available for the benefits of research community,which is available at:https://github.com/possible1402/Dataset-For-Chinese-Medical-Keyphrase-Extraction.Originality/value:By designing comparative experiments,our study demonstrates that character-level formulation is more suitable for Chinese automatic keyphrase extraction task under the general trend of pretrained language models.And our proposed dataset provides a unified method for model evaluation and can promote the development of Chinese automatic keyphrase extraction to some extent.
基金supported by the National Natural Science Foundation of China(61210012)
文摘Craters are salient terrain features on planetary surfaces, and provide useful information about the relative dating of geological unit of planets. In addition, they are ideal landmarks for spacecraft navigation. Due to low contrast and uneven illumination, automatic extraction of craters remains a challenging task. This paper presents a saliency detection method for crater edges and a feature matching algorithm based on edges informa- tion. The craters are extracted through saliency edges detection, edge extraction and selection, feature matching of the same crater edges and robust ellipse fitting. In the edges matching algorithm, a crater feature model is proposed by analyzing the relationship between highlight region edges and shadow region ones. Then, crater edges are paired through the effective matching algorithm. Experiments of real planetary images show that the proposed approach is robust to different lights and topographies, and the detection rate is larger than 90%.
基金This work is developed with the support of the H2020 RISIS 2 Project(No.824091)and of the“Sapienza”Research Awards No.RM1161550376E40E of 2016 and RM11916B8853C925 of 2019.This article is a largely extended version of Bianchi et al.(2019)presented at the ISSI 2019 Conference held in Rome,2–5 September 2019.
文摘Purpose:The main objective of this work is to show the potentialities of recently developed approaches for automatic knowledge extraction directly from the universities’websites.The information automatically extracted can be potentially updated with a frequency higher than once per year,and be safe from manipulations or misinterpretations.Moreover,this approach allows us flexibility in collecting indicators about the efficiency of universities’websites and their effectiveness in disseminating key contents.These new indicators can complement traditional indicators of scientific research(e.g.number of articles and number of citations)and teaching(e.g.number of students and graduates)by introducing further dimensions to allow new insights for“profiling”the analyzed universities.Design/methodology/approach:Webometrics relies on web mining methods and techniques to perform quantitative analyses of the web.This study implements an advanced application of the webometric approach,exploiting all the three categories of web mining:web content mining;web structure mining;web usage mining.The information to compute our indicators has been extracted from the universities’websites by using web scraping and text mining techniques.The scraped information has been stored in a NoSQL DB according to a semistructured form to allow for retrieving information efficiently by text mining techniques.This provides increased flexibility in the design of new indicators,opening the door to new types of analyses.Some data have also been collected by means of batch interrogations of search engines(Bing,www.bing.com)or from a leading provider of Web analytics(SimilarWeb,http://www.similarweb.com).The information extracted from the Web has been combined with the University structural information taken from the European Tertiary Education Register(https://eter.joanneum.at/#/home),a database collecting information on Higher Education Institutions(HEIs)at European level.All the above was used to perform a clusterization of 79 Italian universities based on structural and digital indicators.Findings:The main findings of this study concern the evaluation of the potential in digitalization of universities,in particular by presenting techniques for the automatic extraction of information from the web to build indicators of quality and impact of universities’websites.These indicators can complement traditional indicators and can be used to identify groups of universities with common features using clustering techniques working with the above indicators.Research limitations:The results reported in this study refers to Italian universities only,but the approach could be extended to other university systems abroad.Practical implications:The approach proposed in this study and its illustration on Italian universities show the usefulness of recently introduced automatic data extraction and web scraping approaches and its practical relevance for characterizing and profiling the activities of universities on the basis of their websites.The approach could be applied to other university systems.Originality/value:This work applies for the first time to university websites some recently introduced techniques for automatic knowledge extraction based on web scraping,optical character recognition and nontrivial text mining operations(Bruni&Bianchi,2020).
文摘The scale-invariant feature transform(SIFT)ability to automatic control points(CPs)extraction is very well known on remote sensing images,however,its result inaccurate and sometimes has incorrect matching from generating a small number of false CPs pairs,their matching has high false alarm.This paper presents a method containing a modification to improve the performance of the SIFT CPs matching by applying sum of absolute difference(SAD)in different manner for the new optical satellite generation called near-equatorial orbit satellite(NEqO)and multi-sensor images.The proposed method leads to improving CPs matching with a significantly higher rate of correct matches.The data in this study were obtained from the RazakSAT satellite covering the Kuala Lumpur-Pekan area.The proposed method consists of three parts:(1)applying the SIFT to extract CPs automatically,(2)refining CPs matching by SAD algorithm with empirical threshold,and(3)evaluating the refined CPs scenario by comparing the result of the original SIFT with that of the proposed method.The result indicates an accurate and precise performance of the model,which showed the effectiveness and robustness of the proposed approach.
文摘There are several techniques that were developed for determining the linear features. Lineament extraction?from satellite data has been the most widely used applications in geology. In the present study, lineament has?been extracted from the digital satellite scene (Landsat 5, TM data), in the region of Zahret Median situated in the north west of Tunisia. The image was enhanced and used for automatic extraction. Several directions of features were mapped. The directions of major invoices are NE-SW and NW-SE oriented. The validation of the obtained results is carried out by comparison with the results geophysics as well as to the studies previous of mapping developed in the sector of study.
文摘Digitizing road maps manually is an expensive and time-consuming task. Several methods that intend to develop fully or semi-automated systems have been proposed. In this work we introduce a method, based on the Radon transform and optimal algorithms, which extracts automatically roads on images of rural areas, images that were acquired by digital cameras and airborne laser scanners. The proposed method detects linear segments iteratively and starting from this it generates the centerlines of the roads. The method is based on an objective function which depends on three parameters related to the correlation between the cross-sections, spectral similarity and directions of the segments. Different tests were performed using aerial photos, Ikonos images and laser scanner data of an area located in the state of Parana (Brazil) and their results are presented and discussed. The quality of the detection of the roads centerlines was computed using several indexes - completeness, correctness and RMS. The values obtained reveal the good performance of the proposed methodology.
文摘<span style="font-family:Verdana;">The Near-equatorial orbit (NEqO) satellite represent</span><span style="font-family:Verdana;">s</span><span style="font-family:Verdana;"> a new generation of optical satellite images characterized by nonlinear distortion when captured. Conventional modeling techniques are insufficient to overcome the geometric distortion in these satellite images. This study proposes a new methodology for overcom</span><span style="font-family:Verdana;">ing</span><span style="font-family:Verdana;"> the geometric distortion of the NEqO images. The data used are obtained from RazakSAT and SPOT-5 satellite images in Malaysia. The method starts with applying the RI-SIFT algorithm to extract control points (CPs) automatically. These CPs are used to solve for the transformation parameters of the geometric correction model by applying spline transformations. The result </span><span style="font-family:Verdana;">is </span><span style="font-family:Verdana;">verified through statistical comparison: 1) geometric correction on the RazakSAT image is performed with Spot satellite image with using first-order polynomial trans-formation. 2) Then calculate the root mean square error (RMSE)</span><span style="font-family:Verdana;">. </span><span style="font-family:;" "=""><span style="font-family:Verdana;">3) Compare the calculated RMSE with that obtained from the first step with that of the proposed method. The RMSE value of the geometric corrections using the proposed method was 7.08 × 10</span><sup><span style="font-family:Verdana;"><span style="white-space:nowrap;">−</span>9</span></sup><span style="font-family:Verdana;"> m. The proposed method provides promising results.</span></span>
文摘Semantic lexical chains have been regarded as important in textural cohesion, although traditionally, the classification of these chains has been limited to repetition, synonymy, hyponymy, and collocates. The cases of automatic extraction of lexical chains have found that the contextual synonyms can not be recognized, nor extracted automatically. This study took the data-based technology to extract the contextually co-occurring lexical chains through thematic lexical items. It found that these contextually co-occurring lexical chains can include the semantic lexical chains and contextual synonyms. It also found that, in extraction of collocates of the co-occurring lexical items, these collocates form secondary lexical chains, which contribute to textual cohesion. The vertical lexical chains made of contextually cooccurring lexical items and the horizontal chains made of collocational lexical items work together in making the text into a coherent whole.
基金supported by the National Natural Science Foundation of China (Grant Nos. 40871177 and 41171332)the Knowledge Innovation Project of the Institute of Geographic and Natural Resources Research, the Chinese Academy of Sci-ences (Grant No. 201001005)
文摘The distribution characteristics of the impact craters can provide a large amount of information on impact history and the lunar evolution process. In this research, based on the digital elevation model (DEM) data originating from Change'E-1 CCD stereo camera, three automatic extraction methods for the impact craters are implemented in two research areas: direct extraction from flooded DEM data (the Flooded method), object-oriented extraction from DEM data by using ENVI ZOOM function (the Object-Oriented method) and novel object-oriented extraction from flooded DEM data (the Flooded Object-Oriented method). Accuracy assessment, extracted degree computation, cumulative frequency analysis, shape and age analysis of the extracted craters combined display the following results. (1) The Flooded Object-Oriented method yields better accuracy than the other two methods in the two research areas; the extraction result of the Flooded method offers the similar accuracy to that of the Object-Oriented method. (2) The cumulative frequency curves for the extracted craters and the confirmed craters share a simi- lar change trajectory. (3) The number of the impact craters extracted by the three methods in the Imbrian period is the largest and is of various types; as to their age earlier than lmbrain, it is difficult to extract because they could have been destroyed.
基金This research was supported by National Natural Science Foundation of Chinar for the project of Growth process monitoring of corn by combining time series spectral remote sensing images and terrestrial laser scanning data(41671433)Dynamic calibration of exterior orientations for vehicle laser scanner based structure features(41371434)Estimating the leaf area index of corn in whole growth period using terrestrial LiDAR data(41371327).
文摘Terrestrial LiDAR data can be used to extract accurate structure parameters of corn plant and canopy,such as leaf area,leaf distribution,and 3D model.The first step of these applications is to extract corn leaf points from unorganized LiDAR point clouds.This paper focused on an automated extraction algorithm for identifying the points returning on corn leaf from massive,unorganized LiDAR point clouds.In order to mine the distinct geometry of corn leaves and stalk,the Difference of Normal(DoN)method was proposed to extract corn leaf points.Firstly,the normals of corn leaf surface for all points were estimated on multiple scales.Secondly,the directional ambiguity of the normals was eliminated to obtain the same normal direction for the same leaf distribution.Finally,the DoN was computed and the computed DoN results on the optimal scale were used to extract leave points.The quantitative accuracy assessment showed that the overall accuracy was 94.10%,commission error was 5.89%,and omission error was 18.65%.The results indicate that the proposed method is effective and the corn leaf points can be extracted automatically from massive,unorganized terrestrial LiDAR point clouds using the proposed DoN method.
基金Supported by the National 973 Program of China(No.2007CB310804)the National Natural Science Foundation of China(No.61173061).
文摘A novel method is proposed to automatically extract foreground objects from Martian surface images.The characteristics of Mars images are distinct,e.g.uneven illumination,low contrast between foreground and background,much noise in the background,and foreground objects with irregular shapes.In the context of these characteristics,an image is divided into foreground objects and background information.Homomorphism filtering is first applied to rectify brightness.Then,wavelet transformation enhances contrast and denoises the image.Third,edge detection and active contour are combined to extract contours regardless of the shape of the image.Experimental results show that the method can extract foreground objects from Mars images automatically and accurately,and has many potential applications.
基金This study was jointly supported by the National Science and Technology Major Project Grant No.[30-Y20A01-9003-12/13]the State Key Fundamental Science Funds Grant No.[2010CB951503]+2 种基金National Key Basic Research Program Project Grant No.[2010CB434801]National Key Technology R&D Program of China Grant No.[2012BAH32B03]National Natural Science Foundation of China Grant No.[41101439].
文摘Forest data acquisition,which is of crucial importance for modeling global biogeochemical cycles and climate,makes a contribution to building the ecological Digital Earth(DE).Due to the complex calculations and large volumes of data associated with high-resolution images of large areas,accurate and effective extraction of individual tree crowns remains challenging.In this study,two GeoEye-1 panchromatic images of Beihai and Ningbo in China with areas of 5 and 25 km2,respectively,were used as experimental data to establish a novel method for the automatic extraction of individual tree crowns based on a self-adaptive mutual information(SMI)algorithm and tile computing technology(SMI-TCT).To evaluate the performance of the algorithm,four commonly used algorithms were also applied to extract the individual tree crowns.The overall accuracy of the proposed method for the two experimental areas was superior to that of the four other algorithms,with maximum extraction accuracies of 85.7%and 63.8%.Moreover,the results also indicated that the novel method was suitable for individual tree crowns extraction in sizeable areas because of the multithread parallel computing technology.