In this letter, a new method is proposed for unsupervised classification of terrain types and man-made objects using POLarimetric Synthetic Aperture Radar (POLSAR) data. This technique is a combi-nation of the usage o...In this letter, a new method is proposed for unsupervised classification of terrain types and man-made objects using POLarimetric Synthetic Aperture Radar (POLSAR) data. This technique is a combi-nation of the usage of polarimetric information of SAR images and the unsupervised classification method based on fuzzy set theory. Image quantization and image enhancement are used to preprocess the POLSAR data. Then the polarimetric information and Fuzzy C-Means (FCM) clustering algorithm are used to classify the preprocessed images. The advantages of this algorithm are the automated classification, its high classifica-tion accuracy, fast convergence and high stability. The effectiveness of this algorithm is demonstrated by ex-periments using SIR-C/X-SAR (Spaceborne Imaging Radar-C/X-band Synthetic Aperture Radar) data.展开更多
In this paper, the IHSL transform and the Fuzzy C-Means (FCM) segmentation algorithm are combined together to perform the unsupervised classification for fully polarimetric Synthetic Ap-erture Rader (SAR) data. We app...In this paper, the IHSL transform and the Fuzzy C-Means (FCM) segmentation algorithm are combined together to perform the unsupervised classification for fully polarimetric Synthetic Ap-erture Rader (SAR) data. We apply the IHSL colour transform to H/α/SPANspace to obtain a new space (RGB colour space) which has a uniform distinguishability among inner parameters and contains the whole polarimetric information in H/α/SPAN.Then the FCM algorithm is applied to this RGB space to finish the classification procedure. The main advantages of this method are that the parameters in the color space have similar interclass distinguishability, thus it can achieve a high performance in the pixel based segmentation algorithm, and since we can treat the parameters in the same way, the segmentation procedure can be simplified. The experiments show that it can provide an improved classification result compared with the method which uses the H/α/SPANspace di-rectly during the segmentation procedure.展开更多
Symmetry is a common feature in the real world.It may be used to improve a classification by using the point symmetry-based distance as a measure of clustering.However,it is time consuming to calculate the point symme...Symmetry is a common feature in the real world.It may be used to improve a classification by using the point symmetry-based distance as a measure of clustering.However,it is time consuming to calculate the point symmetry-based distance.Although an efficient parallel point symmetry-based K-means algorithm(ParSym)has been propsed to overcome this limitation,ParSym may get stuck in sub-optimal solutions due to the K-means technique it used.In this study,we proposed a novel parallel point symmetry-based genetic clustering(ParSymG)algorithm for unsupervised classification.The genetic algorithm was introduced to overcome the sub-optimization problem caused by inappropriate selection of initial centroids in ParSym.A message passing interface(MPI)was used to implement the distributed master–slave paradigm.To make the algorithm more time-efficient,a three-phase speedup strategy was adopted for population initialization,image partition,and kd-tree structure-based nearest neighbor searching.The advantages of ParSymG over existing ParSym and parallel K-means(PKM)alogithms were demonstrated through case studies using three different types of remotely sensed images.Results in speedup and time gain proved the excellent scalability of the ParSymG algorithm.展开更多
This paper presents a fuzzy logic approach to efficiently perform unsupervised character classification for improvement in robustness, correctness and speed of a character recognition system. The characters are first ...This paper presents a fuzzy logic approach to efficiently perform unsupervised character classification for improvement in robustness, correctness and speed of a character recognition system. The characters are first split into eight typographical categories. The classification scheme uses pattern matching to classify the characters in each category into a set of fuzzy prototypes based on a nonlinear weighted similarity function. The fuzzy unsupervised character classification, which is natural in the repre...展开更多
Recent comparative studies on mobility patterns are emerging to describe the changes in mobility patterns due to the COVID-19 pandemic.Most of the current studies utilize travel volume per day as the critical indicato...Recent comparative studies on mobility patterns are emerging to describe the changes in mobility patterns due to the COVID-19 pandemic.Most of the current studies utilize travel volume per day as the critical indicator and identify the impacted period by the dates of governmental lockdown or stay-at-home orders,which however may not accurately present the actual impacted dates.The objective of this study is to provide an alternative perspective to identify the normal and pandemic-influenced daily traffic patterns.Instead of only using traffic volumes per day or assuming the impacted travel pattern began with the stay-at-home order,the methodology in this study investigates the within-day timedependent travel speed as time series,and then applies dynamic time warping algorithm and hierarchical clustering unsupervised classification methods to classify days into various groups without assuming a start date for any group.Using the state-wide travel speed data in Alabama,these study measures dissimilarities among within-day travel speed time series.By incorporating the dissimilarities/distance matrix,various agglomerative hierarchical clustering(AHC)methods(average,complete,Ward’s)are tested to conduct proper unsupervised classification.The Ward’s AHC classification results show that within-day travel speed pattern in Alabama shifted more than two weeks before the issuance of the State stay-at-home order.The results further show that a new travel speed pattern appears at the end of stay-at-home order,which is different from either the normal pattern before the pandemic or the initial pandemic-influenced pattern,which leads to a conclusion that a’new normal’within-day travel pattern emerges.展开更多
Aiming to solve the misclassification problems of unsupervised polarimetric Wishart clas- sification algorithm based on Freeman decomposition, an unsupervised Polarimetric Synthetic Aper- ture Radar (SAR) Interferot...Aiming to solve the misclassification problems of unsupervised polarimetric Wishart clas- sification algorithm based on Freeman decomposition, an unsupervised Polarimetric Synthetic Aper- ture Radar (SAR) Interferotnery (PolInSAR) classification algorithm based on optimal coherence set parameters is studied and proposed. This algorithm uses the result of Freeman decomposition to divide the image into three basic categories including surface scattering, volume scattering, and double-bounce Then, the PolInSAR optimal coherence set parameters are used to finely divide each of the three basic categories into 9 categories, and the whole image is divided into 27 categories. Because both the Freeman decomposition result and optimal coherence set parameters indicate specific scattering characteristics, the whole image is merged into 16 categories based on physical meaning. At last, the Wishart cluster is employed to obtain the final classification result. To preserve the purity of scattering characteristics, pixels with similar scattering characteristics are restricted to be classified with other pixels. The final classification results effectively resolve the misclassification problem, not only the buildings can be effectively distinguished from vegetation in urban areas, but also the road is well distinguished from grass. In this paper, the E-SAR PolInSAR data of German Aerospace Center (DLR) are used to verify the effectiveness of the algorithm.展开更多
The study examines the changes of land cover/use resources for the period under investigation.An unsupervised vegetation classification is being performed that provides five distinctive classes and thus assesses these...The study examines the changes of land cover/use resources for the period under investigation.An unsupervised vegetation classification is being performed that provides five distinctive classes and thus assesses these changes in five broad land cover classes-high/moist forests,forest regrowth,mixed savanna,bare land/ grass and water.The remote sensing images used in this work are both images of TM and ETM+in different time periods(1986 to 2001)to determine land cover/use changes.A fairly accuracy report is recorded after performing the unsupervised classification,which shows vegetation has been depleted for over the years.Changes created are mostly human and to a lesser extent environment.Human activities are mainly encroachment thus altering the landscape through activities such as population growth,agriculture,settlements,etc.and environment due to some perceive climatic changes.This vegetation classification highlights the importance to acquire and publish information about the country's partial vegetation cover and vegetation change including vegetation maps and other basic vegetation influencing factors,leading to an understanding of its evolution for a period.展开更多
A critical problem associated with the southern part of Nigeria is the rapid alteration of the landscape as a result of logging, agricultural practices, human migration and expansion, oil exploration, exploitation and...A critical problem associated with the southern part of Nigeria is the rapid alteration of the landscape as a result of logging, agricultural practices, human migration and expansion, oil exploration, exploitation and production activities. These processes have had both positive and negative effects on the economic and socio-political development of the country in general. The negative impacts have led not only to the degradation of the ecosystem but also posing hazards to human health and polluting surface and ground water resources. This has created the need for the development of a rapid, cost effective and efficient land use/land cover (LULC) classification technique to monitor the biophysical dynamics in the region. Due to the complex land cover patterns existing in the study area and the occasionally indistinguishable relationship between land cover and spectral signals, this paper introduces a combined use of unsupervised and supervised image classification for detecting land use/land cover (LULC) classes. With the continuous conflict over the impact of oil activities in the area, this work provides a procedure for detecting LULC change, which is an important factor to consider in the design of an environmental decision-making framework. Results from the use of this technique on Landsat TM and ETM+ of 1987 and 2002 are discussed. The results reveal the pros and cons of the two methods and the effects of their overall accuracy on post-classification change detection.展开更多
In the petroleum industry, sensor data and information are valuable. It can detect, predict and help to understand processes during oil production. Offshore wells require more attention. Once workovers, maintenance, a...In the petroleum industry, sensor data and information are valuable. It can detect, predict and help to understand processes during oil production. Offshore wells require more attention. Once workovers, maintenance, and intervention are more costly than onshore wells. Coupling data-driven methods for well-monitoring applications, two unsupervised classification methods, one statistical and one machine learning-based, are proposed to detect anomalies in well data. The novelty is presented by applying a Control Chart us</span><span style="font-family:Verdana;">ing a 3 standard deviations window for the Permanent Downhole Gauge Pr</span><span style="font-family:Verdana;">es</span><span style="font-family:Verdana;">sure sensor (P-PDG), and a Fuzzy C-means algorithm to classify data from pr</span><span style="font-family:Verdana;">essure and temperature sensors in an offshore field. The main goal in structuring a classified data set is using it to train machine learning models to monitor and manage petroleum production. Modeling applications for early fault detection systems in offshore production, based on real-time data from production sensors, require classified data sets. Then, labeling two target classes</span></span><span style="font-family:Verdana;">:</span><span style="font-family:""><span style="font-family:Verdana;"> “normal” and “fault” is a key step to be implemented in order to train the machine learning models. Therefore, this paper applies two methodologies to classify a real-time data set to create a training data set divided into “normal” </span><span style="font-family:Verdana;">and “fault” classes. Thus, it is possible to visualize the abnormal events poi</span><span style="font-family:Verdana;">nted out by the methodologies and compare how sensible is each method. In addition, </span></span><span style="font-family:Verdana;">it </span><span style="font-family:""><span style="font-family:Verdana;">is proposed a random forest application to test the performance of the classified data sets from both methods. The results have shown that the con</span><span style="font-family:Verdana;">trol chart method presents higher sensibility than fuzzy c-means, however, th</span><span style="font-family:Verdana;">e </span><span style="font-family:Verdana;">differences between are insignificant. The random forest performance displ</span><span style="font-family:Verdana;">ayed sensitivity and specificity values of 99.91% and 100% for the data set classified by the control chart method and 94.01% and 99.98% for the data set classified by fuzzy c-means algorithm.展开更多
The automatic classification of power lines from airborne light detection and ranging(LiDAR)data is a crucial task for power supply management.The methods for power line classification can be either supervised or unsu...The automatic classification of power lines from airborne light detection and ranging(LiDAR)data is a crucial task for power supply management.The methods for power line classification can be either supervised or unsupervised.Supervised methods might achieve high accuracy for small areas,but it is time consuming to collect training data over areas of different conditions and complexity.Therefore,unsupervised methods that can automatically work over different areas without sophisticated parameter tuning are in great demand.In this paper,we presented a hierarchical unsupervised LiDAR-based power line classification method that first screened the power line candidate points(including the power line corridor direction detection based on a layered Hough transform,connectivity analysis,and Douglas–Peucker simplification algorithm),followed by the extraction of contextual linear and angular features for each candidate laser points,and finally by setting the feature threshold values to identify the power line points.We tested the method over both forest and urban areas and found that the precision,recall and quality rates were up to 96.7%,88.8%and 78.3%,respectively,for the test datasets and were higher than the ones from a previously developed supervised classification method.Overall,our approach has the advantages of achieving relatively high accuracy and being relatively fast.展开更多
For decades, Africa has undergone many crisis, affecting economy, climate, food, politics as well as society with irreparable consequences on the environment. The protection of the latter, being one of the cornerstone...For decades, Africa has undergone many crisis, affecting economy, climate, food, politics as well as society with irreparable consequences on the environment. The protection of the latter, being one of the cornerstones of sustainable development, is only possible if it's based on a reliable and rigorous diagnosis and inventory. This study suggests a method to characterize natural resources, in particular agricultural ones, by showing their landscape context. In this perspective and in the absence of any pre-existing mapping, as it is often the case in Africa, this work provides a simple and reproducible approach that merely uses Landsat Thematic Mapper (TM) (free) images, with the only constraint of the cross-checking of several images at different times of the plant cycle.展开更多
Whether a species is rare and requires protection or is overabundant and needs control, an accurate estimate of population size is essential for the development of conservation plans and management goals. Current wild...Whether a species is rare and requires protection or is overabundant and needs control, an accurate estimate of population size is essential for the development of conservation plans and management goals. Current wildlife surveys are logistically difficult, frequently biased, and time consuming. Therefore, there is a need to provide additional techniques to improve survey methods for censusing wildlife species. We examined three methods to enumerate animals in remotely sensed aerial imagery: manual photo interpretation, an unsupervised classification, and multi- image, multi-step technique. We compared the performance of the three techniques based on the probability of correctly detecting animals, the probability of under-counting animals (false positives), and the probability of over-counting animals (false negatives). Manual photo-interpretation had a high probability of detecting an animal (81% ± 24%), the lowest probability of over-counting an animal (8% ± 16%), and a relatively low probability of under-counting an animal (19% ± 24%). An unsupervised, ISODATA classification with subtraction of a background image had the highest probability of detecting an animal (82% ± 10%), a high probability of over-counting an animal (69% ± 27%) but a low probability of under-counting an animal (18% ± 18%). The multi-image, multi-step procedure incorporated more information, but had the lowest probability of detecting an animal (50% ± 26%), the highest probability of over-counting an animal (72% ± 26%), and the highest probability of under-counting an animal (50% ± 26%). Manual interpreters better discriminated between animal and non-animal features and had fewer over-counting errors (i.e., false positives) than either the unsupervised classification or the multi-image, multi-step techniques indicating that benefits of automation need to be weighed against potential losses in accuracy. Identification and counting of animals in remotely sensed imagery could provide wildlife managers with a tool to improve population estimates and aid in enumerating animals across large natural systems.展开更多
To sustain the management of natural resources, land use and land cover (LULC) should be spatially mapped and temporally monitored using GIS. For large areas, conventional methods are laborious. Alternatively, remot...To sustain the management of natural resources, land use and land cover (LULC) should be spatially mapped and temporally monitored using GIS. For large areas, conventional methods are laborious. Alternatively, remote sensing can be used for LULC mapping and monitoring. Normalized differential vegetation index (NDVI) is the most used vegetation index for crop identification and phenology. For agricultural areas, crop statistics are estimated yearly at regional level following administrative units. However, these statistics are not informing about spatial extent of these crops within administrative units; such information is crucial for crop monitoring. The main objective of this research was to fill the gap, based on statistical methods and GIS, by adding spatial information to crop statistics by analyzing temporal NDVI profiles. The study area covers 1300 km2. Data consist of 147 decadal Spot Vegetation NDVI images. Crop statistics were compiled on seasonal basis and aggregated to different administrative levels. Images were processed using an unsupervised classification method. A series of classification runs corresponding to different numbers of clusters were used. Using stepwise multiple linear regression, cropped areas from agricultural statistics were related to areas of each NDVI profile cluster. Estimated regression coefficients were used to generate maps showing cropped fractions by map units. The optimal number of clusters was 18. Similar profiles were merged leading to eight clusters. The results show that, for example, rice was grown, in autumn, on 50% of the area of map-units represented by NDVI-profile group 4 and 75% of the area of group 7 while it was grown, in spring, on 2, 69 and 25% of areas of NDVI-profile groups 2, 61 and 7, respectively. Regression coefficients were used to generate map of crops. This research illustrates the benefit of integrating statistical methods, GIS, remote sensing and crop statistics to delineate NDVI profile clusters with their corresponding agricultural land cover map units and to link these statistics to geographical locations. These map units can be used as a reference for future monitoring of natural resources, in particular crop growth and development and for forecasting crop production and/or yield and stresses like drought.展开更多
Biology is a challenging and complicated mess. Understanding this challenging complexity is the realm of the biological sciences: Trying to make sense of the massive, messy data in terms of discovering patterns and re...Biology is a challenging and complicated mess. Understanding this challenging complexity is the realm of the biological sciences: Trying to make sense of the massive, messy data in terms of discovering patterns and revealing its underlying general rules. Among the most powerful mathematical tools for organizing and helping to structure complex, heterogeneous and noisy data are the tools provided by multivariate statistical analysis (MSA) approaches. These eigenvector/eigenvalue data-compression approaches were first introduced to electron microscopy (EM) in 1980 to help sort out different views of macromolecules in a micrograph. After 35 years of continuous use and developments, new MSA applications are still being proposed regularly. The speed of computing has increased dramatically in the decades since their first use in electron microscopy. However, we have also seen a possibly even more rapid increase in the size and complexity of the EM data sets to be studied. MSA computations had thus become a very serious bottleneck limiting its general use. The parallelization of our programs—speeding up the process by orders of magnitude—has opened whole new avenues of research. The speed of the automatic classification in the compressed eigenvector space had also become a bottleneck which needed to be removed. In this paper we explain the basic principles of multivariate statistical eigenvector-eigenvalue data compression;we provide practical tips and application examples for those working in structural biology, and we provide the more experienced researcher in this and other fields with the formulas associated with these powerful MSA approaches.展开更多
Rice paddy mapping with optical remote sensing is challenging in Bangladesh due to the heterogeneous cropping pattern, fragmented field size and cloud </span><span style="font-family:Verdana;">co...Rice paddy mapping with optical remote sensing is challenging in Bangladesh due to the heterogeneous cropping pattern, fragmented field size and cloud </span><span style="font-family:Verdana;">cover during the growing period. The high-resolution Synthetic Aperture</span><span style="font-family:Verdana;"> Radar (SAR) sensor is the potential alternate to mapping rice area in Bangla</span><span style="font-family:Verdana;">desh. The L-band SAR sensor onboard Advanced Land Observing Satellit</span><span style="font-family:Verdana;">e (</span><span style="font-family:Verdana;">ALOS) acquires multi-polarization and multi-temporal images are </span><span style="font-family:Verdana;">a very useful tool for rice area mapping. In this study, we used ALOS-2 ScanSAR dual (HH</span></span><span style="font-family:""> </span><span style="font-family:Verdana;">+</span><span style="font-family:""> </span><span style="font-family:Verdana;">HV) polarized time series data in the study area. We used orthorectification and slope corrected backscatter (sigma-naught) images and median filtering (3 × 3) window for image processing. The unsupervised classification with the k-means++ algorithm is used for initial clustering (20 categories) of images over the study area. The GPS location of rice paddy field with cropping pattern over study area uses for classifying the different rice-growing season from the k-means clustering data. The result is compared with the moderate resolution imaging spectroradiometer (MODIS) based rice area and national statistical agricultural yearbook statistics. The results show that, based on the MODIS based rice map, the rice fields can be mapped with a conditional Kappa value of 0.68 and at user’s and producer’s accuracies of 86% and 90%, respectively. The large commission error primarily came from confusion between wet season Aus rice and others crop, Aus-Amon and Boro-Aus-Amon cropping pattern because of their similar backscatter amplitudes and temporal similarities in the rice growing season. The relatively high rice mapping accuracy in this study indicates that the ALOS/PALSAR-2 data could provide useful information in rice cropping management in subtropical regions such Bangladesh.展开更多
This paper is an empirical study of unsupervised sentiment classification of Chinese reviews. The focus is on exploring the ways to improve the performance of the unsupervised sentiment classification based on limited...This paper is an empirical study of unsupervised sentiment classification of Chinese reviews. The focus is on exploring the ways to improve the performance of the unsupervised sentiment classification based on limited existing sentiment resources in Chinese. On the one hand, all available Chinese sentiment lexicons - individual and combined - are evaluated under our proposed framework. On the other hand, the domain dependent sentiment noise words are identified and removed using unlabeled data, to improve the classification performance. To the best of our knowledge, this is the first such attempt. Experiments have been conducted on three open datasets in two domains, and the results show that the proposed algorithm for sentiment noise words removal can improve the classification performance significantly.展开更多
Algal blooms are a frequent subject in scientific discussions and are the focus of many recent studies,mainly due to their adverse effect on society.Given the lack of ground truth data and the need to develop tools fo...Algal blooms are a frequent subject in scientific discussions and are the focus of many recent studies,mainly due to their adverse effect on society.Given the lack of ground truth data and the need to develop tools for their detection and monitoring,this research proposes a novel method to automate detection.Concepts derived from multi-temporal image series processing,spectral indices and classification with Oneclass Support Vector Machine(OC-SVM)are used in this proposal.Imagery from multi-spectral sensors on Landsat-8 and MODIS were acquired through the Google Earth Engine API(GEE API).In order to evaluate our method,two bloom detection case studies(Lake Erie(USA)and Lake Taihu(China))were performed.Comparisons were made with methods based on spectral index thresholds.Also,to demonstrate the performance of the OC-SVM classifier compared to other machine learning methods,the proposal was adapted to be used with a Random Forest(RF)classifier,having its results added to the analysis.In situ measurements show that the proposed method delivers highly accurate results compared to spectral index thresholding approaches.However,a drawback of the proposal refers to its higher computational cost.The application of the new method to a real-world bloom case is demonstrated.展开更多
In this paper, at first a new line-symmetry-based distance is proposed. The properties of the proposed distance are then elaborately described. Kd-tree-based nearest neighbor search is used to reduce the complexity of...In this paper, at first a new line-symmetry-based distance is proposed. The properties of the proposed distance are then elaborately described. Kd-tree-based nearest neighbor search is used to reduce the complexity of computing the proposed line-symmetry-based distance. Thereafter an evolutionary clustering technique is developed that uses the new linesymmetry-based distance measure for assigning points to different clusters. Adaptive mutation and crossover probabilities are used to accelerate the proposed clustering technique. The proposed GA with line-symmetry-distance-based (GALSD) clustering technique is able to detect any type of clusters, irrespective of their geometrical shape and overlapping nature, as long as they possess the characteristics of line symmetry. GALSD is compared with the existing well-known K-means clustering algorithm and a newly developed genetic point-symmetry-distance-based clustering technique (GAPS) for three artificial and two real-life data sets. The efficacy of the proposed line-symmetry-based distance is then shown in recognizing human face from a given image.展开更多
基金Supported by the University Doctorate Special Research Fund (No. 20030614001) and the Youth Scholarship Leader Fund of Univ. of Electro. Sci. and Tech. of China.
文摘In this letter, a new method is proposed for unsupervised classification of terrain types and man-made objects using POLarimetric Synthetic Aperture Radar (POLSAR) data. This technique is a combi-nation of the usage of polarimetric information of SAR images and the unsupervised classification method based on fuzzy set theory. Image quantization and image enhancement are used to preprocess the POLSAR data. Then the polarimetric information and Fuzzy C-Means (FCM) clustering algorithm are used to classify the preprocessed images. The advantages of this algorithm are the automated classification, its high classifica-tion accuracy, fast convergence and high stability. The effectiveness of this algorithm is demonstrated by ex-periments using SIR-C/X-SAR (Spaceborne Imaging Radar-C/X-band Synthetic Aperture Radar) data.
文摘In this paper, the IHSL transform and the Fuzzy C-Means (FCM) segmentation algorithm are combined together to perform the unsupervised classification for fully polarimetric Synthetic Ap-erture Rader (SAR) data. We apply the IHSL colour transform to H/α/SPANspace to obtain a new space (RGB colour space) which has a uniform distinguishability among inner parameters and contains the whole polarimetric information in H/α/SPAN.Then the FCM algorithm is applied to this RGB space to finish the classification procedure. The main advantages of this method are that the parameters in the color space have similar interclass distinguishability, thus it can achieve a high performance in the pixel based segmentation algorithm, and since we can treat the parameters in the same way, the segmentation procedure can be simplified. The experiments show that it can provide an improved classification result compared with the method which uses the H/α/SPANspace di-rectly during the segmentation procedure.
基金Thiswork was supported by the National Natural Science Foundation of China[grant number 41471313],[grant num-ber 41101356],[grant number 41671391]the Fundamental Research Funds for the Central Universities[grant num-ber 2016XZZX004-02]+1 种基金the Science and Technology Project of Zhejiang Province[grant number 2015C33021],[grant number 2013C33051]Major Program of China High Resolution Earth Observation System[grant number 07-Y30B10-9001].
文摘Symmetry is a common feature in the real world.It may be used to improve a classification by using the point symmetry-based distance as a measure of clustering.However,it is time consuming to calculate the point symmetry-based distance.Although an efficient parallel point symmetry-based K-means algorithm(ParSym)has been propsed to overcome this limitation,ParSym may get stuck in sub-optimal solutions due to the K-means technique it used.In this study,we proposed a novel parallel point symmetry-based genetic clustering(ParSymG)algorithm for unsupervised classification.The genetic algorithm was introduced to overcome the sub-optimization problem caused by inappropriate selection of initial centroids in ParSym.A message passing interface(MPI)was used to implement the distributed master–slave paradigm.To make the algorithm more time-efficient,a three-phase speedup strategy was adopted for population initialization,image partition,and kd-tree structure-based nearest neighbor searching.The advantages of ParSymG over existing ParSym and parallel K-means(PKM)alogithms were demonstrated through case studies using three different types of remotely sensed images.Results in speedup and time gain proved the excellent scalability of the ParSymG algorithm.
文摘This paper presents a fuzzy logic approach to efficiently perform unsupervised character classification for improvement in robustness, correctness and speed of a character recognition system. The characters are first split into eight typographical categories. The classification scheme uses pattern matching to classify the characters in each category into a set of fuzzy prototypes based on a nonlinear weighted similarity function. The fuzzy unsupervised character classification, which is natural in the repre...
基金supported by New Faculty Award from UAH’s Office of the Vice President for Research and Economic Development.
文摘Recent comparative studies on mobility patterns are emerging to describe the changes in mobility patterns due to the COVID-19 pandemic.Most of the current studies utilize travel volume per day as the critical indicator and identify the impacted period by the dates of governmental lockdown or stay-at-home orders,which however may not accurately present the actual impacted dates.The objective of this study is to provide an alternative perspective to identify the normal and pandemic-influenced daily traffic patterns.Instead of only using traffic volumes per day or assuming the impacted travel pattern began with the stay-at-home order,the methodology in this study investigates the within-day timedependent travel speed as time series,and then applies dynamic time warping algorithm and hierarchical clustering unsupervised classification methods to classify days into various groups without assuming a start date for any group.Using the state-wide travel speed data in Alabama,these study measures dissimilarities among within-day travel speed time series.By incorporating the dissimilarities/distance matrix,various agglomerative hierarchical clustering(AHC)methods(average,complete,Ward’s)are tested to conduct proper unsupervised classification.The Ward’s AHC classification results show that within-day travel speed pattern in Alabama shifted more than two weeks before the issuance of the State stay-at-home order.The results further show that a new travel speed pattern appears at the end of stay-at-home order,which is different from either the normal pattern before the pandemic or the initial pandemic-influenced pattern,which leads to a conclusion that a’new normal’within-day travel pattern emerges.
文摘Aiming to solve the misclassification problems of unsupervised polarimetric Wishart clas- sification algorithm based on Freeman decomposition, an unsupervised Polarimetric Synthetic Aper- ture Radar (SAR) Interferotnery (PolInSAR) classification algorithm based on optimal coherence set parameters is studied and proposed. This algorithm uses the result of Freeman decomposition to divide the image into three basic categories including surface scattering, volume scattering, and double-bounce Then, the PolInSAR optimal coherence set parameters are used to finely divide each of the three basic categories into 9 categories, and the whole image is divided into 27 categories. Because both the Freeman decomposition result and optimal coherence set parameters indicate specific scattering characteristics, the whole image is merged into 16 categories based on physical meaning. At last, the Wishart cluster is employed to obtain the final classification result. To preserve the purity of scattering characteristics, pixels with similar scattering characteristics are restricted to be classified with other pixels. The final classification results effectively resolve the misclassification problem, not only the buildings can be effectively distinguished from vegetation in urban areas, but also the road is well distinguished from grass. In this paper, the E-SAR PolInSAR data of German Aerospace Center (DLR) are used to verify the effectiveness of the algorithm.
文摘The study examines the changes of land cover/use resources for the period under investigation.An unsupervised vegetation classification is being performed that provides five distinctive classes and thus assesses these changes in five broad land cover classes-high/moist forests,forest regrowth,mixed savanna,bare land/ grass and water.The remote sensing images used in this work are both images of TM and ETM+in different time periods(1986 to 2001)to determine land cover/use changes.A fairly accuracy report is recorded after performing the unsupervised classification,which shows vegetation has been depleted for over the years.Changes created are mostly human and to a lesser extent environment.Human activities are mainly encroachment thus altering the landscape through activities such as population growth,agriculture,settlements,etc.and environment due to some perceive climatic changes.This vegetation classification highlights the importance to acquire and publish information about the country's partial vegetation cover and vegetation change including vegetation maps and other basic vegetation influencing factors,leading to an understanding of its evolution for a period.
文摘A critical problem associated with the southern part of Nigeria is the rapid alteration of the landscape as a result of logging, agricultural practices, human migration and expansion, oil exploration, exploitation and production activities. These processes have had both positive and negative effects on the economic and socio-political development of the country in general. The negative impacts have led not only to the degradation of the ecosystem but also posing hazards to human health and polluting surface and ground water resources. This has created the need for the development of a rapid, cost effective and efficient land use/land cover (LULC) classification technique to monitor the biophysical dynamics in the region. Due to the complex land cover patterns existing in the study area and the occasionally indistinguishable relationship between land cover and spectral signals, this paper introduces a combined use of unsupervised and supervised image classification for detecting land use/land cover (LULC) classes. With the continuous conflict over the impact of oil activities in the area, this work provides a procedure for detecting LULC change, which is an important factor to consider in the design of an environmental decision-making framework. Results from the use of this technique on Landsat TM and ETM+ of 1987 and 2002 are discussed. The results reveal the pros and cons of the two methods and the effects of their overall accuracy on post-classification change detection.
文摘In the petroleum industry, sensor data and information are valuable. It can detect, predict and help to understand processes during oil production. Offshore wells require more attention. Once workovers, maintenance, and intervention are more costly than onshore wells. Coupling data-driven methods for well-monitoring applications, two unsupervised classification methods, one statistical and one machine learning-based, are proposed to detect anomalies in well data. The novelty is presented by applying a Control Chart us</span><span style="font-family:Verdana;">ing a 3 standard deviations window for the Permanent Downhole Gauge Pr</span><span style="font-family:Verdana;">es</span><span style="font-family:Verdana;">sure sensor (P-PDG), and a Fuzzy C-means algorithm to classify data from pr</span><span style="font-family:Verdana;">essure and temperature sensors in an offshore field. The main goal in structuring a classified data set is using it to train machine learning models to monitor and manage petroleum production. Modeling applications for early fault detection systems in offshore production, based on real-time data from production sensors, require classified data sets. Then, labeling two target classes</span></span><span style="font-family:Verdana;">:</span><span style="font-family:""><span style="font-family:Verdana;"> “normal” and “fault” is a key step to be implemented in order to train the machine learning models. Therefore, this paper applies two methodologies to classify a real-time data set to create a training data set divided into “normal” </span><span style="font-family:Verdana;">and “fault” classes. Thus, it is possible to visualize the abnormal events poi</span><span style="font-family:Verdana;">nted out by the methodologies and compare how sensible is each method. In addition, </span></span><span style="font-family:Verdana;">it </span><span style="font-family:""><span style="font-family:Verdana;">is proposed a random forest application to test the performance of the classified data sets from both methods. The results have shown that the con</span><span style="font-family:Verdana;">trol chart method presents higher sensibility than fuzzy c-means, however, th</span><span style="font-family:Verdana;">e </span><span style="font-family:Verdana;">differences between are insignificant. The random forest performance displ</span><span style="font-family:Verdana;">ayed sensitivity and specificity values of 99.91% and 100% for the data set classified by the control chart method and 94.01% and 99.98% for the data set classified by fuzzy c-means algorithm.
基金the National Natural Science Foundation of China(grant numbers 41601426 and 41771462)the Hunan Provincial Natural Science Foundation(grant number 2018JJ3155)+1 种基金the Open Foundation of Key Laboratory of Digital Mapping and Land Information Application of National Administration of Surveying,Map-ping and Geoinformation,Wuhan University(grant number GCWD201806)the China Scholarship Council(grant number 201708430040).
文摘The automatic classification of power lines from airborne light detection and ranging(LiDAR)data is a crucial task for power supply management.The methods for power line classification can be either supervised or unsupervised.Supervised methods might achieve high accuracy for small areas,but it is time consuming to collect training data over areas of different conditions and complexity.Therefore,unsupervised methods that can automatically work over different areas without sophisticated parameter tuning are in great demand.In this paper,we presented a hierarchical unsupervised LiDAR-based power line classification method that first screened the power line candidate points(including the power line corridor direction detection based on a layered Hough transform,connectivity analysis,and Douglas–Peucker simplification algorithm),followed by the extraction of contextual linear and angular features for each candidate laser points,and finally by setting the feature threshold values to identify the power line points.We tested the method over both forest and urban areas and found that the precision,recall and quality rates were up to 96.7%,88.8%and 78.3%,respectively,for the test datasets and were higher than the ones from a previously developed supervised classification method.Overall,our approach has the advantages of achieving relatively high accuracy and being relatively fast.
文摘For decades, Africa has undergone many crisis, affecting economy, climate, food, politics as well as society with irreparable consequences on the environment. The protection of the latter, being one of the cornerstones of sustainable development, is only possible if it's based on a reliable and rigorous diagnosis and inventory. This study suggests a method to characterize natural resources, in particular agricultural ones, by showing their landscape context. In this perspective and in the absence of any pre-existing mapping, as it is often the case in Africa, this work provides a simple and reproducible approach that merely uses Landsat Thematic Mapper (TM) (free) images, with the only constraint of the cross-checking of several images at different times of the plant cycle.
文摘Whether a species is rare and requires protection or is overabundant and needs control, an accurate estimate of population size is essential for the development of conservation plans and management goals. Current wildlife surveys are logistically difficult, frequently biased, and time consuming. Therefore, there is a need to provide additional techniques to improve survey methods for censusing wildlife species. We examined three methods to enumerate animals in remotely sensed aerial imagery: manual photo interpretation, an unsupervised classification, and multi- image, multi-step technique. We compared the performance of the three techniques based on the probability of correctly detecting animals, the probability of under-counting animals (false positives), and the probability of over-counting animals (false negatives). Manual photo-interpretation had a high probability of detecting an animal (81% ± 24%), the lowest probability of over-counting an animal (8% ± 16%), and a relatively low probability of under-counting an animal (19% ± 24%). An unsupervised, ISODATA classification with subtraction of a background image had the highest probability of detecting an animal (82% ± 10%), a high probability of over-counting an animal (69% ± 27%) but a low probability of under-counting an animal (18% ± 18%). The multi-image, multi-step procedure incorporated more information, but had the lowest probability of detecting an animal (50% ± 26%), the highest probability of over-counting an animal (72% ± 26%), and the highest probability of under-counting an animal (50% ± 26%). Manual interpreters better discriminated between animal and non-animal features and had fewer over-counting errors (i.e., false positives) than either the unsupervised classification or the multi-image, multi-step techniques indicating that benefits of automation need to be weighed against potential losses in accuracy. Identification and counting of animals in remotely sensed imagery could provide wildlife managers with a tool to improve population estimates and aid in enumerating animals across large natural systems.
文摘To sustain the management of natural resources, land use and land cover (LULC) should be spatially mapped and temporally monitored using GIS. For large areas, conventional methods are laborious. Alternatively, remote sensing can be used for LULC mapping and monitoring. Normalized differential vegetation index (NDVI) is the most used vegetation index for crop identification and phenology. For agricultural areas, crop statistics are estimated yearly at regional level following administrative units. However, these statistics are not informing about spatial extent of these crops within administrative units; such information is crucial for crop monitoring. The main objective of this research was to fill the gap, based on statistical methods and GIS, by adding spatial information to crop statistics by analyzing temporal NDVI profiles. The study area covers 1300 km2. Data consist of 147 decadal Spot Vegetation NDVI images. Crop statistics were compiled on seasonal basis and aggregated to different administrative levels. Images were processed using an unsupervised classification method. A series of classification runs corresponding to different numbers of clusters were used. Using stepwise multiple linear regression, cropped areas from agricultural statistics were related to areas of each NDVI profile cluster. Estimated regression coefficients were used to generate maps showing cropped fractions by map units. The optimal number of clusters was 18. Similar profiles were merged leading to eight clusters. The results show that, for example, rice was grown, in autumn, on 50% of the area of map-units represented by NDVI-profile group 4 and 75% of the area of group 7 while it was grown, in spring, on 2, 69 and 25% of areas of NDVI-profile groups 2, 61 and 7, respectively. Regression coefficients were used to generate map of crops. This research illustrates the benefit of integrating statistical methods, GIS, remote sensing and crop statistics to delineate NDVI profile clusters with their corresponding agricultural land cover map units and to link these statistics to geographical locations. These map units can be used as a reference for future monitoring of natural resources, in particular crop growth and development and for forecasting crop production and/or yield and stresses like drought.
文摘Biology is a challenging and complicated mess. Understanding this challenging complexity is the realm of the biological sciences: Trying to make sense of the massive, messy data in terms of discovering patterns and revealing its underlying general rules. Among the most powerful mathematical tools for organizing and helping to structure complex, heterogeneous and noisy data are the tools provided by multivariate statistical analysis (MSA) approaches. These eigenvector/eigenvalue data-compression approaches were first introduced to electron microscopy (EM) in 1980 to help sort out different views of macromolecules in a micrograph. After 35 years of continuous use and developments, new MSA applications are still being proposed regularly. The speed of computing has increased dramatically in the decades since their first use in electron microscopy. However, we have also seen a possibly even more rapid increase in the size and complexity of the EM data sets to be studied. MSA computations had thus become a very serious bottleneck limiting its general use. The parallelization of our programs—speeding up the process by orders of magnitude—has opened whole new avenues of research. The speed of the automatic classification in the compressed eigenvector space had also become a bottleneck which needed to be removed. In this paper we explain the basic principles of multivariate statistical eigenvector-eigenvalue data compression;we provide practical tips and application examples for those working in structural biology, and we provide the more experienced researcher in this and other fields with the formulas associated with these powerful MSA approaches.
文摘Rice paddy mapping with optical remote sensing is challenging in Bangladesh due to the heterogeneous cropping pattern, fragmented field size and cloud </span><span style="font-family:Verdana;">cover during the growing period. The high-resolution Synthetic Aperture</span><span style="font-family:Verdana;"> Radar (SAR) sensor is the potential alternate to mapping rice area in Bangla</span><span style="font-family:Verdana;">desh. The L-band SAR sensor onboard Advanced Land Observing Satellit</span><span style="font-family:Verdana;">e (</span><span style="font-family:Verdana;">ALOS) acquires multi-polarization and multi-temporal images are </span><span style="font-family:Verdana;">a very useful tool for rice area mapping. In this study, we used ALOS-2 ScanSAR dual (HH</span></span><span style="font-family:""> </span><span style="font-family:Verdana;">+</span><span style="font-family:""> </span><span style="font-family:Verdana;">HV) polarized time series data in the study area. We used orthorectification and slope corrected backscatter (sigma-naught) images and median filtering (3 × 3) window for image processing. The unsupervised classification with the k-means++ algorithm is used for initial clustering (20 categories) of images over the study area. The GPS location of rice paddy field with cropping pattern over study area uses for classifying the different rice-growing season from the k-means clustering data. The result is compared with the moderate resolution imaging spectroradiometer (MODIS) based rice area and national statistical agricultural yearbook statistics. The results show that, based on the MODIS based rice map, the rice fields can be mapped with a conditional Kappa value of 0.68 and at user’s and producer’s accuracies of 86% and 90%, respectively. The large commission error primarily came from confusion between wet season Aus rice and others crop, Aus-Amon and Boro-Aus-Amon cropping pattern because of their similar backscatter amplitudes and temporal similarities in the rice growing season. The relatively high rice mapping accuracy in this study indicates that the ALOS/PALSAR-2 data could provide useful information in rice cropping management in subtropical regions such Bangladesh.
基金Supported by the National Natural Science Foundation of China(Nos.60405011,60575057,and 60875073)
文摘This paper is an empirical study of unsupervised sentiment classification of Chinese reviews. The focus is on exploring the ways to improve the performance of the unsupervised sentiment classification based on limited existing sentiment resources in Chinese. On the one hand, all available Chinese sentiment lexicons - individual and combined - are evaluated under our proposed framework. On the other hand, the domain dependent sentiment noise words are identified and removed using unlabeled data, to improve the classification performance. To the best of our knowledge, this is the first such attempt. Experiments have been conducted on three open datasets in two domains, and the results show that the proposed algorithm for sentiment noise words removal can improve the classification performance significantly.
基金Fundação de AmparoáPesquisa do Estado de São Paulo(FAPESP)(grants 2018/01033-3)for their financial support of this research.
文摘Algal blooms are a frequent subject in scientific discussions and are the focus of many recent studies,mainly due to their adverse effect on society.Given the lack of ground truth data and the need to develop tools for their detection and monitoring,this research proposes a novel method to automate detection.Concepts derived from multi-temporal image series processing,spectral indices and classification with Oneclass Support Vector Machine(OC-SVM)are used in this proposal.Imagery from multi-spectral sensors on Landsat-8 and MODIS were acquired through the Google Earth Engine API(GEE API).In order to evaluate our method,two bloom detection case studies(Lake Erie(USA)and Lake Taihu(China))were performed.Comparisons were made with methods based on spectral index thresholds.Also,to demonstrate the performance of the OC-SVM classifier compared to other machine learning methods,the proposal was adapted to be used with a Random Forest(RF)classifier,having its results added to the analysis.In situ measurements show that the proposed method delivers highly accurate results compared to spectral index thresholding approaches.However,a drawback of the proposal refers to its higher computational cost.The application of the new method to a real-world bloom case is demonstrated.
文摘In this paper, at first a new line-symmetry-based distance is proposed. The properties of the proposed distance are then elaborately described. Kd-tree-based nearest neighbor search is used to reduce the complexity of computing the proposed line-symmetry-based distance. Thereafter an evolutionary clustering technique is developed that uses the new linesymmetry-based distance measure for assigning points to different clusters. Adaptive mutation and crossover probabilities are used to accelerate the proposed clustering technique. The proposed GA with line-symmetry-distance-based (GALSD) clustering technique is able to detect any type of clusters, irrespective of their geometrical shape and overlapping nature, as long as they possess the characteristics of line symmetry. GALSD is compared with the existing well-known K-means clustering algorithm and a newly developed genetic point-symmetry-distance-based clustering technique (GAPS) for three artificial and two real-life data sets. The efficacy of the proposed line-symmetry-based distance is then shown in recognizing human face from a given image.