Efficient iterative unsupervised machine learning involving probabilistic clustering analysis with the expectation-maximization(EM)clustering algorithm is applied to categorize reservoir facies by exploiting latent an...Efficient iterative unsupervised machine learning involving probabilistic clustering analysis with the expectation-maximization(EM)clustering algorithm is applied to categorize reservoir facies by exploiting latent and observable well-log variables from a clastic reservoir in the Majnoon oilfield,southern Iraq.The observable well-log variables consist of conventional open-hole,well-log data and the computer-processed interpretation of gamma rays,bulk density,neutron porosity,compressional sonic,deep resistivity,shale volume,total porosity,and water saturation,from three wells located in the Nahr Umr reservoir.The latent variables include shale volume and water saturation.The EM algorithm efficiently characterizes electrofacies through iterative machine learning to identify the local maximum likelihood estimates(MLE)of the observable and latent variables in the studied dataset.The optimized EM model developed successfully predicts the core-derived facies classification in two of the studied wells.The EM model clusters the data into three distinctive reservoir electrofacies(F1,F2,and F3).F1 represents a gas-bearing electrofacies with low shale volume(Vsh)and water saturation(Sw)and high porosity and permeability values identifying it as an attractive reservoir target.The results of the EM model are validated using nuclear magnetic resonance(NMR)data from the third studied well for which no cores were recovered.The NMR results confirm the effectiveness and accuracy of the EM model in predicting electrofacies.The utilization of the EM algorithm for electrofacies classification/cluster analysis is innovative.Specifically,the clusters it establishes are less rigidly constrained than those derived from the more commonly used K-means clustering method.The EM methodology developed generates dependable electrofacies estimates in the studied reservoir intervals where core samples are not available.Therefore,once calibrated with core data in some wells,the model is suitable for application to other wells that lack core data.展开更多
A significant portion of Landslide Early Warning Systems (LEWS) relies on the definition of operational thresholds and the monitoring of cumulative rainfall for alert issuance. These thresholds can be obtained in vari...A significant portion of Landslide Early Warning Systems (LEWS) relies on the definition of operational thresholds and the monitoring of cumulative rainfall for alert issuance. These thresholds can be obtained in various ways, but most often they are based on previous landslide data. This approach introduces several limitations. For instance, there is a requirement for the location to have been previously monitored in some way to have this type of information recorded. Another significant limitation is the need for information regarding the location and timing of incidents. Despite the current ease of obtaining location information (GPS, drone images, etc.), the timing of the event remains challenging to ascertain for a considerable portion of landslide data. Concerning rainfall monitoring, there are multiple ways to consider it, for instance, examining accumulations over various intervals (1 h, 6 h, 24 h, 72 h), as well as in the calculation of effective rainfall, which represents the precipitation that actually infiltrates the soil. However, in the vast majority of cases, both the thresholds and the rain monitoring approach are defined manually and subjectively, relying on the operators’ experience. This makes the process labor-intensive and time-consuming, hindering the establishment of a truly standardized and rapidly scalable methodology on a large scale. In this work, we propose a Landslides Early Warning System (LEWS) based on the concept of rainfall half-life and the determination of thresholds using Cluster Analysis and data inversion. The system is designed to be applied in extensive monitoring networks, such as the one utilized by Cemaden, Brazil’s National Center for Monitoring and Early Warning of Natural Disasters.展开更多
The classification of the springtime water mass has an important influence on the hydrography,regional climate change and fishery in the Taiwan Strait.Based on 58 stations of CTD profiling data collected in the wester...The classification of the springtime water mass has an important influence on the hydrography,regional climate change and fishery in the Taiwan Strait.Based on 58 stations of CTD profiling data collected in the western and southwestern Taiwan Strait during the spring cruise of 2019,we analyze the spatial distributions of temperature(T)and salinity(S)in the investigation area.Then by using the fuzzy cluster method combined with the T-S similarity number,we classify the investigation area into 5 water masses:the Minzhe Coastal Water(MZCW),the Taiwan Strait Mixed Water(TSMW),the South China Sea Surface Water(SCSSW),the South China Sea Subsurface Water(SCSUW)and the Kuroshio Branch Water(KBW).The MZCW appears in the near surface layer along the western coast of Taiwan Strait,showing low-salinity(<32.0)tongues near the Minjiang River Estuary and the Xiamen Bay mouth.The TSMW covers most upper layer of the investigation area.The SCSSW is mainly distributed in the upper layer of the southwestern Taiwan Strait,beneath which is the SCSUW.The KBW is a high temperature(core value of 26.36℃)and high salinity(core value of 34.62)water mass located southeast of the Taiwan Bank and partially in the central Taiwan Strait.展开更多
Background:To solve the cluster analysis better,we propose a new method based on the chaotic particle swarm optimization(CPSO)algorithm.Methods:In order to enhance the performance in clustering,we propose a novel meth...Background:To solve the cluster analysis better,we propose a new method based on the chaotic particle swarm optimization(CPSO)algorithm.Methods:In order to enhance the performance in clustering,we propose a novel method based on CPSO.We first evaluate the clustering performance of this model using the variance ratio criterion(VRC)as the evaluation metric.The effectiveness of the CPSO algorithm is compared with that of the traditional particle swarm optimization(PSO)algorithm.The CPSO aims to improve the VRC value while avoiding local optimal solutions.The simulated dataset is set at three levels of overlapping:non-overlapping,partial overlapping,and severe overlapping.Finally,we compare CPSO with two other methods.Results:By observing the comparative results,our proposed CPSO method performs outstandingly.In the conditions of non-overlapping,partial overlapping,and severe overlapping,our method has the best VRC values of 1683.2,620.5,and 275.6,respectively.The mean VRC values in these three cases are 1683.2,617.8,and 222.6.Conclusion:The CPSO performed better than other methods for cluster analysis problems.CPSO is effective for cluster analysis.展开更多
The paper deals with cluster analysis and comparison of clustering methods. Cluster analysis belongs to multivariate statistical methods. Cluster analysis is defined as general logical technique, procedure, which allo...The paper deals with cluster analysis and comparison of clustering methods. Cluster analysis belongs to multivariate statistical methods. Cluster analysis is defined as general logical technique, procedure, which allows clustering variable objects into groups-clusters on the basis of similarity or dissimilarity. Cluster analysis involves computational procedures, of which purpose is to reduce a set of data on several relatively homogenous groups-clusters, while the condition of reduction is maximal and simultaneously minimal similarity of clusters. Similarity of objects is studied by the degree of similarity (correlation coefficient and association coefficient) or the degree of dissimilarity-degree of distance (distance coefficient). Methods of cluster analysis are on the basis of clustering classified as hierarchical or non-hierarchical methods.展开更多
The scientific and fair positioning of monitoring locations for surface displacement on slopes is a prerequisite for early warning and forecasting.However,there is no specific provision on how to effectively determine...The scientific and fair positioning of monitoring locations for surface displacement on slopes is a prerequisite for early warning and forecasting.However,there is no specific provision on how to effectively determine the number and location of monitoring points according to the actual deformation characteristics of the slope.There are still some defects in the layout of monitoring points.To this end,based on displacement data series and spatial location information of surface displacement monitoring points,by combining displacement series correlation and spatial distance influence factors,a spatial deformation correlation calculation model of slope based on clustering analysis was proposed to calculate the correlation between different monitoring points,based on which the deformation area of the slope was divided.The redundant monitoring points in each partition were eliminated based on the partition's outcome,and the overall optimal arrangement of slope monitoring points was then achieved.This method scientifically addresses the issues of slope deformation zoning and data gathering overlap.It not only eliminates human subjectivity from slope deformation zoning but also increases the efficiency and accuracy of slope monitoring.In order to verify the effectiveness of the method,a sand-mudstone interbedded CounterTilt excavation slope in the Chongqing city of China was used as the research object.Twenty-four monitoring points deployed on this slope were monitored for surface displacement for 13 months.The spatial location of the monitoring points was discussed.The results show that the proposed method of slope deformation zoning and the optimized placement of monitoring points are feasible.展开更多
[Objective] The aim was to study the variation of leaf characters from different provenance sources of Polygonum multiflorum Thunb,as well as to carry out cluster analysis on P.multiflorum from different provenance so...[Objective] The aim was to study the variation of leaf characters from different provenance sources of Polygonum multiflorum Thunb,as well as to carry out cluster analysis on P.multiflorum from different provenance sources to provide basis for the classification,identification,breeding and improved variety selection of P.multiflorum.[Method] Leaf shape characters of 31 copies of germplasm resources in the major distribution region of the whole country were determined,and the genetic variation of P.multiflorum leaves from different producing areas was analyzed.[Result] The leaf characters of single plant of the same experimental provenance source of P.multiflorum were relatively stable,the variation was mainly found on the single leaf area,1/2 leaf width,leaf width and other indicators;the variation of each leaf character among different provenance sources was obvious,and the variation was mainly found on the single leaf weight,leaf area,1/2 leaf width,leaf length and other indicators.The correlation analysis of each leaf character in P.multiflorum suggested that the single leaf area and single leaf weight showed extremely significant positive correlation with leaf length,1/2 leaf width,leaf width,leaf thickness and leaf stem length,while the single leaf area and single leaf weight showed significant negative correlation with WWR(leaf width/1/2 leaf width)and LWR(leaf length/1/2 leaf length),in addition,several macroscopic leaf characters such as leaf length,1/2 leaf width,leaf width,leaf stem length showed extremely positive correlation.The main component analysis result suggested that the contribution rate of accumulation variance of the front three main components was up to 97.4%,which could better reflect the comprehensive performance of leaf characters of different provenance sources of P.multiflorum.The cluster analysis showed that the experimental 31 copies of P.multiflorum provenance sources should be divided into three classes,the first class was distributed in the Middle,Western of Guizhou,northwestern of Guangxi and western areas with higher altitude;the second class was distributed in Hunan,Hubei,Sichuan,Guangdong and the most area of Guangxi;the third class was distributed in Anhui,Jiangsu and Henan and Shandong.[Conclusion] Cluster analysis of leaf characters indicated that the kinds of provenance sources which the geographical position was closer could be got together.The study had provided a certain basis for the classification of P.multiflorum.展开更多
Because of the difficulty to obtain the traffic flow information of lanes at non-detector intersections in most metropolises of the world,based on the relationships between the lanes of signal-controlled intersections...Because of the difficulty to obtain the traffic flow information of lanes at non-detector intersections in most metropolises of the world,based on the relationships between the lanes of signal-controlled intersections,cluster analysis and stepwise regression are integrated to predict the traffic volume of lanes at non-detector isolated controlled intersections.First cluster analysis is used to cluster the lanes of non-detector isolated signal-controlled intersections and the lanes of all signal-controlled intersections with detectors.Then, by the results of cluster analysis,the traffic volume samples are selected randomly and stepwise regression is used to predict the traffic volume of lanes at non-detector isolated signal-controlled intersections.The method is tested by the traffic volume data of lanes of the road network of Nanjing city.The problem of predicting the traffic volume of lanes at non-detector isolated signal-controlled intersections was resolved and can be widely used in urban traffic flow guidance and urban traffic control in cities without enough intersections equipped with detectors.展开更多
In order to analyze the heterogeneity in vehicular traffic speed, a new method that integrates cluster analysis and probability distribution function fitting is presented. First, for identifying the optimal number of ...In order to analyze the heterogeneity in vehicular traffic speed, a new method that integrates cluster analysis and probability distribution function fitting is presented. First, for identifying the optimal number of clusters, the two-step cluster method is applied to analyze actual speed data, which suggests that dividing speed data into two clusters can best reflect the intrinsic patterns of traffic flows. Such information is then taken as guidance in probability distribution function fitting. The normal, skew-normal and skew-t distribution functions are used to fit the probability distribution of each cluster respectively, which suggests that the skew-t distribution has the highest fitting accuracy; the second is skew-normal distribution; the worst is normal distribution. Model analysis results demonstrate that the proposed mixture model has a better fitting and generalization capability than the conventional single model. In addition, the new method is more flexible in terms of data fitting and can provide a more accurate model of speed distribution.展开更多
[ObJective] The research aimed to determine the geographic distribution map of system of Rana dybowskii. [Method] Four morphologic indices (body length, body weight, forelimb length, hindlimb length) of eight geogra...[ObJective] The research aimed to determine the geographic distribution map of system of Rana dybowskii. [Method] Four morphologic indices (body length, body weight, forelimb length, hindlimb length) of eight geographical populations of R.dybowskii which naturally distribute in Changhai Mountain and Xiaoxing'an Mountain were measured. Measure results were variance analyzed and cluster analyzed. [Result] Variance analysis showed: the genetic branching among the Dongfanghong male population( belongs to Wandashan) and Xiaoxing'an Mountain male population and Changbai Mountain male population were significantly different (P〈0.05) ; the genetic branching between the Hebei female population (belongs to Xiaoxing'an Mountain) and Changbai Mountain female population was significantly different (P〈0.05 ). Cluster analysis showed : male R.dybowskii can be divided into three groups : the first group included Quanyang, Tianbei, Chaoyang and Ddkouqin, the second group included Tieli and Anshan, the third group included Dongfanghong; and the female R. dybowskii can be divided into three groups : the first group included Quanyang and Chaoyang, the second group included Tianbei and Dakouqin, the third group included Hebei. [Condusion] The paper deduced that the Sanjiang Plain was the geographical origin center ofR. dybowskii which radiated to Changbai Mountain and Xiaoxing'an Mountain along the adverse current of Songhua River basin, therefore, the current distribution pattern of R. dybowskii was formed.展开更多
In order to reveal the genetic differences and agronomic traits of Fagopy-rum tataricum_ varieties (lines) intuitively, explore good resources and avoid the blindness of parent selection during the breeding process,...In order to reveal the genetic differences and agronomic traits of Fagopy-rum tataricum_ varieties (lines) intuitively, explore good resources and avoid the blindness of parent selection during the breeding process, six primary agronomic traits of 45 F. tataricum_ varieties (lines) that came from the eleven buckwheat breeding departments across the country were analyzed with principal component analysis and cluster analysis. The results of principal component analysis showed that the six agronomic traits could be simplified into three principal components, and the cumulative contribution rate reached 83%. The results of cluster analysis showed that the 45 F. tataricum varieties (lines) were classified into four groups:high stalk, medium yield and smal grain type, medium stalk, high yield and large grain type, medium stalk, low yield and smal grain type and high stalk, medium yield and medium grain type. Among them, performance of comprehensive trait of the second type was better than that of the other types. Thus, the F. tataricum_va-rieties (lines) that were classified into the second type could be considered as good varieties (lines) or breeding materials. The genetic differences among F. tataricum_varieties (lines) had no necessary correlations with origin and geographical distance. ln addition to complementary traits and geographical distance, genetic distances (dif-ferent populations) should be taken into consideration during parent selection in cross breeding.展开更多
In order to compare the characteristics of different varieties of sweet cherry and to formulate corresponding pruning scheme, hierarchical cluster analysis was conducted for the 14 sweet cherry varieties that were mai...In order to compare the characteristics of different varieties of sweet cherry and to formulate corresponding pruning scheme, hierarchical cluster analysis was conducted for the 14 sweet cherry varieties that were mainly planted in Shanxi Province. The results showed that the 14 varieties of sweet cherry could be divided into two types, Hongmanao and Rainier. Fruit setting rate, branching rate, medium fruit shoot proportion, spur proportion and yield per plant were significantly different between these two types of sweet cherry. The key points of pruning management, to improve the yield of Rainier type, were to increase the fruit setting rate and spur proportion, and to control properly the long and medium fruit shoot proportion.展开更多
[Objective] This research aimed to study the FTIR spectra of corn germs and endosperms so as to provide a scientific way for identifying corn of different types. [Method] The corn germs and endosperms of three types w...[Objective] This research aimed to study the FTIR spectra of corn germs and endosperms so as to provide a scientific way for identifying corn of different types. [Method] The corn germs and endosperms of three types were studied by using Fourier transform infrared spectroscopy(FTIR) technology, combined with cluster analysis. [Result] The overall characteristics of original FTIR spectra were basically similar within the range of 700-1 800 cm^-1. The FTIR spectra were mainly composed by the absorption peaks of polysaccharides, proteins and lipids. Within the wavelength range of 700-1 800 cm^-1, there were only tiny differences in original FTIR spectra among the corn germs and endosperms of three different types. The spectra were then processed by using first derivative and second derivative. The second derivative spectra were used for hierarchical cluster analysis(HCA). The results showed that with the wavelength range of 700-1 800 cm^-1, the second derivative spectra of the 52 samples could be better clustered according to the tree types and corn germ and corn endosperm. The clustering correct rate reached 96.1%.[Conclusion] FTIR technology, combined with cluster analysis, can be used to identify different types of corn germs and endosperms, and it is characterized by convenience and rapidness.展开更多
According to the aggregation method of experts' evaluation information in group decision-making,the existing methods of determining experts' weights based on cluster analysis take into account the expert's preferen...According to the aggregation method of experts' evaluation information in group decision-making,the existing methods of determining experts' weights based on cluster analysis take into account the expert's preferences and the consistency of expert's collating vectors,but they lack of the measure of information similarity.So it may occur that although the collating vector is similar to the group consensus,information uncertainty is great of a certain expert.However,it is clustered to a larger group and given a high weight.For this,a new aggregation method based on entropy and cluster analysis in group decision-making process is provided,in which the collating vectors are classified with information similarity coefficient,and the experts' weights are determined according to the result of classification,the entropy of collating vectors and the judgment matrix consistency.Finally,a numerical example shows that the method is feasible and effective.展开更多
Clustered heavy rains (CHRs) defined using hierarchical cluster analysis based on daily observations of precipitation in China during 1960-2008 are investi- gated in this paper. The geographical pattern of CHRs in C...Clustered heavy rains (CHRs) defined using hierarchical cluster analysis based on daily observations of precipitation in China during 1960-2008 are investi- gated in this paper. The geographical pattern of CHRs in China shows three high-frequency centers--South China, the Yangtze River basin, and part of North China around the Bohai Sea. CHRs occur most frequently in South China with a mean annual frequency of 6.8 (a total of 334 times during 1960-2008). June has the highest monthly frequency (2.2 times/month with a total of 108 times dur- ing 1960-2008), partly in association with the Meiyu phenomenon in the Yangtze River basin. Within the past 50 years, the frequency of CHRs in China has increased significantly from 13.5 to 17.3 times per year, which is approximately 28%. In the 1990s, the frequency of CHRs often reached 19.1 times per year. The geographical extent of CHR has expanded slightly by 0.5 stations, and its average daily rainfall intensity has increased by 3.7 mm d-1. The contribution of CHRs to total rainfall amount and the frequency of daily precipitation have increased by 63.1% and 22.7%, respectively, partly due to a significant decrease in light rains. In drying regions of North and Northeast China, the amounts of minimal CHRs have had no significant trend in recent years, probably due to warming in these arid regions enhancing atmospheric conveetivity at individual stations.展开更多
The summer day-by-day precipitation data of 97 meteorological stations on the Qinghai-Tibet Plateau from 1961 to 2004 were selected to analyze the temporal-spatial distribution through accumulated variance,correlation...The summer day-by-day precipitation data of 97 meteorological stations on the Qinghai-Tibet Plateau from 1961 to 2004 were selected to analyze the temporal-spatial distribution through accumulated variance,correlation analysis,regression analysis,empirical orthogonal function,power spectrum function and spatial analysis tools of GIS.The result showed that summer precipitation occupied a relatively high proportion in the area with less annual precipitation on the Plateau and the correlation between summer precipitation and annual precipitation was strong.The altitude of these stations and summer precipitation tendency presented stronger positive correlation below 2000 m,with correlation value up to 0.604(α=0.01).The subtracting tendency values between 1961-1983 and 1984-2004 at five altitude ranges(2000-2500 m,2500-3000 m,3500-4000 m,4000-4500 m and above 4500 m)were above zero and accounted for 71.4%of the total.Using empirical orthogonal function, summer precipitation could be roughly divided into three precipitation pattern fields:the Southeast Plateau Pattern Field,the Northeast Plateau Pattern field and the Three Rivers' Headstream Regions Pattern Field.The former two ones had a reverse value from the north to the south and opposite line was along 35°N.The potential cycles of the three pattern fields were 5.33a,21.33a and 2.17a respectively,tested by the confidence probability of 90%.The station altitudes and summer precipitation potential cycles presented strong negative correlation in the stations above 4500 m,with correlation value of-0.626(α=0.01).In Three Rivers Headstream Regions summer precipitation cycle decreased as the altitude rose in the stations above 3500 m and increased as the altitude rose in those below 3500 m.The empirical orthogonal function analysis in June precipitation,July precipitation and August precipitation showed that the June precipitation pattern field was similar to the July's,in which southern Plateau was positive and northern Plateau negative.But positive value area in July precipitation pattern field was obviously less than June's.The August pattern field was totally opposite to June's and July's.The positive area in August pattern field jumped from the southern Plateau to the northern Plateau.展开更多
Diversity of 60 conventional japonica rice accessions with good eating quality at home and abroad was analyzed using SSR molecular markers, agronomic traits and taste characteristics. A total of 290 alleles were detec...Diversity of 60 conventional japonica rice accessions with good eating quality at home and abroad was analyzed using SSR molecular markers, agronomic traits and taste characteristics. A total of 290 alleles were detected in the 60 accessions at 72 SSR loci with the high similarity coefficients varying between 0.600 and 0.924. The loci on chromosome 5 showed the greatest value in average allele number. Additionally, most of the SSR loci could detect 3 to 4 alleles. An UPGMA dendrogram based on the cluster analysis of the genetic similarity coefficients showed that the grouping trend of part of the rice accessions was geographic-related and most of the rice accessions in Jiangsu Province, China were clustered together. Furthermore, many domestic accessions from south and north origins in China were close to the foreign japonica rice varieties, as proved by their pedigree origin from the foreign high-quality sources. For taste characteristics, part of the accessions with excellent taste were clearly clustered into one category though they came from different geographical regions, which indicates that taste characteristics of some varieties were mainly genetically determined. In addition, the agronomic traits of japonica rice with good taste might be closely related with their geographical origins, but the relationship between superior taste characteristics and agronomic traits should be further clarified.展开更多
To meet China's CO2 intensity target of 40%-45% reduction by 2020 based on the 2005 level, a regional allocation method based on cluster analysis is developed. Thirty Chinese provinces are classified into six groups ...To meet China's CO2 intensity target of 40%-45% reduction by 2020 based on the 2005 level, a regional allocation method based on cluster analysis is developed. Thirty Chinese provinces are classified into six groups based on economy, emissions, and reduction potential indicators. Under the equity principle, the two most developed groups axe assigned the highest reduction targets (55% and 65%, respectively). However, their reduction potent!al is limited. Under the efficiency principle, the two groups with the highest reduction potential take the highest targets (48% and 61%, respectively), but their economy is relatively backward. When equity and efficiency are equally weighted, the 5th group with a prominent reduction potential takes the highest target (54%), and the 2nd and the 3rd groups with large industry scales take the second highest target (49%). However, under all the three allocation schemes, the targets are not greater than 40% for the 4th and the 6th groups, which have a relatively low economic ability, emissions, and reduction potential. Due to inconsistency between economic and reduction potential, corresponding market mechanisms and policy instruments should be established to ensure equity and efficiency of regional target allocation.展开更多
By gas chromatogram, six crude oils fingerprinting distributed in four oilfields and four oil platforms were analyzed and the corre- sponding normal paraffin hydrocarbon ( including pristane and phytane) concentrati...By gas chromatogram, six crude oils fingerprinting distributed in four oilfields and four oil platforms were analyzed and the corre- sponding normal paraffin hydrocarbon ( including pristane and phytane) concentration was obtained by the internal standard methed. The normal paraffin hydrocarbon distribution patterns of six crude oils were built and compared. The cluster analysis on the normal paraffin hydrocarbon concentration was conducted for classification and some ratios of oils were used for oils comparison. The results indicated: there was a clear difference within different crude oils in different oil fields and a small difference between the crude oils in the same oil platform. The normal paraffin hydrocarbon distribution pattern and ratios, as well as the cluster analysis on the nomad paraffin hydrocarbon concentration can have a better differentiation result for the crude oils with small difference than the original gas chromatogram.展开更多
To quantitatively identify the maintenance demand for each highway segments in the pavement maintenance scheme design,a mathematical model of uniform segment division was established and an approach of applying cluste...To quantitatively identify the maintenance demand for each highway segments in the pavement maintenance scheme design,a mathematical model of uniform segment division was established and an approach of applying cluster analysis theory to the uniform segment division and evaluation of pavement maintenance demand was proposed.The actual maintenance project of a highway carried out in Guangdong province was cited as an example to demonstrate the validity of the proposed method.It is proved that the cluster analysis can eliminate human factors in classification without being constrained by the quantities of samples,considering multiple pavement distress indexes and the continuity of samples.Thus it is evident that cluster analysis is an efficient analytical tool in uniform segment division and evaluation of maintenance demand.展开更多
文摘Efficient iterative unsupervised machine learning involving probabilistic clustering analysis with the expectation-maximization(EM)clustering algorithm is applied to categorize reservoir facies by exploiting latent and observable well-log variables from a clastic reservoir in the Majnoon oilfield,southern Iraq.The observable well-log variables consist of conventional open-hole,well-log data and the computer-processed interpretation of gamma rays,bulk density,neutron porosity,compressional sonic,deep resistivity,shale volume,total porosity,and water saturation,from three wells located in the Nahr Umr reservoir.The latent variables include shale volume and water saturation.The EM algorithm efficiently characterizes electrofacies through iterative machine learning to identify the local maximum likelihood estimates(MLE)of the observable and latent variables in the studied dataset.The optimized EM model developed successfully predicts the core-derived facies classification in two of the studied wells.The EM model clusters the data into three distinctive reservoir electrofacies(F1,F2,and F3).F1 represents a gas-bearing electrofacies with low shale volume(Vsh)and water saturation(Sw)and high porosity and permeability values identifying it as an attractive reservoir target.The results of the EM model are validated using nuclear magnetic resonance(NMR)data from the third studied well for which no cores were recovered.The NMR results confirm the effectiveness and accuracy of the EM model in predicting electrofacies.The utilization of the EM algorithm for electrofacies classification/cluster analysis is innovative.Specifically,the clusters it establishes are less rigidly constrained than those derived from the more commonly used K-means clustering method.The EM methodology developed generates dependable electrofacies estimates in the studied reservoir intervals where core samples are not available.Therefore,once calibrated with core data in some wells,the model is suitable for application to other wells that lack core data.
文摘A significant portion of Landslide Early Warning Systems (LEWS) relies on the definition of operational thresholds and the monitoring of cumulative rainfall for alert issuance. These thresholds can be obtained in various ways, but most often they are based on previous landslide data. This approach introduces several limitations. For instance, there is a requirement for the location to have been previously monitored in some way to have this type of information recorded. Another significant limitation is the need for information regarding the location and timing of incidents. Despite the current ease of obtaining location information (GPS, drone images, etc.), the timing of the event remains challenging to ascertain for a considerable portion of landslide data. Concerning rainfall monitoring, there are multiple ways to consider it, for instance, examining accumulations over various intervals (1 h, 6 h, 24 h, 72 h), as well as in the calculation of effective rainfall, which represents the precipitation that actually infiltrates the soil. However, in the vast majority of cases, both the thresholds and the rain monitoring approach are defined manually and subjectively, relying on the operators’ experience. This makes the process labor-intensive and time-consuming, hindering the establishment of a truly standardized and rapidly scalable methodology on a large scale. In this work, we propose a Landslides Early Warning System (LEWS) based on the concept of rainfall half-life and the determination of thresholds using Cluster Analysis and data inversion. The system is designed to be applied in extensive monitoring networks, such as the one utilized by Cemaden, Brazil’s National Center for Monitoring and Early Warning of Natural Disasters.
基金The National Natural Science Foundation of China under contract Nos 42106005,91958203,41676131,41876155.
文摘The classification of the springtime water mass has an important influence on the hydrography,regional climate change and fishery in the Taiwan Strait.Based on 58 stations of CTD profiling data collected in the western and southwestern Taiwan Strait during the spring cruise of 2019,we analyze the spatial distributions of temperature(T)and salinity(S)in the investigation area.Then by using the fuzzy cluster method combined with the T-S similarity number,we classify the investigation area into 5 water masses:the Minzhe Coastal Water(MZCW),the Taiwan Strait Mixed Water(TSMW),the South China Sea Surface Water(SCSSW),the South China Sea Subsurface Water(SCSUW)and the Kuroshio Branch Water(KBW).The MZCW appears in the near surface layer along the western coast of Taiwan Strait,showing low-salinity(<32.0)tongues near the Minjiang River Estuary and the Xiamen Bay mouth.The TSMW covers most upper layer of the investigation area.The SCSSW is mainly distributed in the upper layer of the southwestern Taiwan Strait,beneath which is the SCSUW.The KBW is a high temperature(core value of 26.36℃)and high salinity(core value of 34.62)water mass located southeast of the Taiwan Bank and partially in the central Taiwan Strait.
文摘Background:To solve the cluster analysis better,we propose a new method based on the chaotic particle swarm optimization(CPSO)algorithm.Methods:In order to enhance the performance in clustering,we propose a novel method based on CPSO.We first evaluate the clustering performance of this model using the variance ratio criterion(VRC)as the evaluation metric.The effectiveness of the CPSO algorithm is compared with that of the traditional particle swarm optimization(PSO)algorithm.The CPSO aims to improve the VRC value while avoiding local optimal solutions.The simulated dataset is set at three levels of overlapping:non-overlapping,partial overlapping,and severe overlapping.Finally,we compare CPSO with two other methods.Results:By observing the comparative results,our proposed CPSO method performs outstandingly.In the conditions of non-overlapping,partial overlapping,and severe overlapping,our method has the best VRC values of 1683.2,620.5,and 275.6,respectively.The mean VRC values in these three cases are 1683.2,617.8,and 222.6.Conclusion:The CPSO performed better than other methods for cluster analysis problems.CPSO is effective for cluster analysis.
文摘The paper deals with cluster analysis and comparison of clustering methods. Cluster analysis belongs to multivariate statistical methods. Cluster analysis is defined as general logical technique, procedure, which allows clustering variable objects into groups-clusters on the basis of similarity or dissimilarity. Cluster analysis involves computational procedures, of which purpose is to reduce a set of data on several relatively homogenous groups-clusters, while the condition of reduction is maximal and simultaneously minimal similarity of clusters. Similarity of objects is studied by the degree of similarity (correlation coefficient and association coefficient) or the degree of dissimilarity-degree of distance (distance coefficient). Methods of cluster analysis are on the basis of clustering classified as hierarchical or non-hierarchical methods.
基金funding from the National Natural Science Foundation of China(No.41572308)。
文摘The scientific and fair positioning of monitoring locations for surface displacement on slopes is a prerequisite for early warning and forecasting.However,there is no specific provision on how to effectively determine the number and location of monitoring points according to the actual deformation characteristics of the slope.There are still some defects in the layout of monitoring points.To this end,based on displacement data series and spatial location information of surface displacement monitoring points,by combining displacement series correlation and spatial distance influence factors,a spatial deformation correlation calculation model of slope based on clustering analysis was proposed to calculate the correlation between different monitoring points,based on which the deformation area of the slope was divided.The redundant monitoring points in each partition were eliminated based on the partition's outcome,and the overall optimal arrangement of slope monitoring points was then achieved.This method scientifically addresses the issues of slope deformation zoning and data gathering overlap.It not only eliminates human subjectivity from slope deformation zoning but also increases the efficiency and accuracy of slope monitoring.In order to verify the effectiveness of the method,a sand-mudstone interbedded CounterTilt excavation slope in the Chongqing city of China was used as the research object.Twenty-four monitoring points deployed on this slope were monitored for surface displacement for 13 months.The spatial location of the monitoring points was discussed.The results show that the proposed method of slope deformation zoning and the optimized placement of monitoring points are feasible.
基金Supported by High-tech Research Project of Jiangsu Province(BG2004314)~~
文摘[Objective] The aim was to study the variation of leaf characters from different provenance sources of Polygonum multiflorum Thunb,as well as to carry out cluster analysis on P.multiflorum from different provenance sources to provide basis for the classification,identification,breeding and improved variety selection of P.multiflorum.[Method] Leaf shape characters of 31 copies of germplasm resources in the major distribution region of the whole country were determined,and the genetic variation of P.multiflorum leaves from different producing areas was analyzed.[Result] The leaf characters of single plant of the same experimental provenance source of P.multiflorum were relatively stable,the variation was mainly found on the single leaf area,1/2 leaf width,leaf width and other indicators;the variation of each leaf character among different provenance sources was obvious,and the variation was mainly found on the single leaf weight,leaf area,1/2 leaf width,leaf length and other indicators.The correlation analysis of each leaf character in P.multiflorum suggested that the single leaf area and single leaf weight showed extremely significant positive correlation with leaf length,1/2 leaf width,leaf width,leaf thickness and leaf stem length,while the single leaf area and single leaf weight showed significant negative correlation with WWR(leaf width/1/2 leaf width)and LWR(leaf length/1/2 leaf length),in addition,several macroscopic leaf characters such as leaf length,1/2 leaf width,leaf width,leaf stem length showed extremely positive correlation.The main component analysis result suggested that the contribution rate of accumulation variance of the front three main components was up to 97.4%,which could better reflect the comprehensive performance of leaf characters of different provenance sources of P.multiflorum.The cluster analysis showed that the experimental 31 copies of P.multiflorum provenance sources should be divided into three classes,the first class was distributed in the Middle,Western of Guizhou,northwestern of Guangxi and western areas with higher altitude;the second class was distributed in Hunan,Hubei,Sichuan,Guangdong and the most area of Guangxi;the third class was distributed in Anhui,Jiangsu and Henan and Shandong.[Conclusion] Cluster analysis of leaf characters indicated that the kinds of provenance sources which the geographical position was closer could be got together.The study had provided a certain basis for the classification of P.multiflorum.
基金The National Natural Science Foundation of China(No.50378016).
文摘Because of the difficulty to obtain the traffic flow information of lanes at non-detector intersections in most metropolises of the world,based on the relationships between the lanes of signal-controlled intersections,cluster analysis and stepwise regression are integrated to predict the traffic volume of lanes at non-detector isolated controlled intersections.First cluster analysis is used to cluster the lanes of non-detector isolated signal-controlled intersections and the lanes of all signal-controlled intersections with detectors.Then, by the results of cluster analysis,the traffic volume samples are selected randomly and stepwise regression is used to predict the traffic volume of lanes at non-detector isolated signal-controlled intersections.The method is tested by the traffic volume data of lanes of the road network of Nanjing city.The problem of predicting the traffic volume of lanes at non-detector isolated signal-controlled intersections was resolved and can be widely used in urban traffic flow guidance and urban traffic control in cities without enough intersections equipped with detectors.
基金The National Science Foundation by Changjiang Scholarship of Ministry of Education of China(No.BCS-0527508)the Joint Research Fund for Overseas Natural Science of China(No.51250110075)+1 种基金the Natural Science Foundation of Jiangsu Province(No.BK200910046)the Postdoctoral Science Foundation of Jiangsu Province(No.0901005C)
文摘In order to analyze the heterogeneity in vehicular traffic speed, a new method that integrates cluster analysis and probability distribution function fitting is presented. First, for identifying the optimal number of clusters, the two-step cluster method is applied to analyze actual speed data, which suggests that dividing speed data into two clusters can best reflect the intrinsic patterns of traffic flows. Such information is then taken as guidance in probability distribution function fitting. The normal, skew-normal and skew-t distribution functions are used to fit the probability distribution of each cluster respectively, which suggests that the skew-t distribution has the highest fitting accuracy; the second is skew-normal distribution; the worst is normal distribution. Model analysis results demonstrate that the proposed mixture model has a better fitting and generalization capability than the conventional single model. In addition, the new method is more flexible in terms of data fitting and can provide a more accurate model of speed distribution.
文摘[ObJective] The research aimed to determine the geographic distribution map of system of Rana dybowskii. [Method] Four morphologic indices (body length, body weight, forelimb length, hindlimb length) of eight geographical populations of R.dybowskii which naturally distribute in Changhai Mountain and Xiaoxing'an Mountain were measured. Measure results were variance analyzed and cluster analyzed. [Result] Variance analysis showed: the genetic branching among the Dongfanghong male population( belongs to Wandashan) and Xiaoxing'an Mountain male population and Changbai Mountain male population were significantly different (P〈0.05) ; the genetic branching between the Hebei female population (belongs to Xiaoxing'an Mountain) and Changbai Mountain female population was significantly different (P〈0.05 ). Cluster analysis showed : male R.dybowskii can be divided into three groups : the first group included Quanyang, Tianbei, Chaoyang and Ddkouqin, the second group included Tieli and Anshan, the third group included Dongfanghong; and the female R. dybowskii can be divided into three groups : the first group included Quanyang and Chaoyang, the second group included Tianbei and Dakouqin, the third group included Hebei. [Condusion] The paper deduced that the Sanjiang Plain was the geographical origin center ofR. dybowskii which radiated to Changbai Mountain and Xiaoxing'an Mountain along the adverse current of Songhua River basin, therefore, the current distribution pattern of R. dybowskii was formed.
基金Supported by National Oat and Buckwheat Industrial Technology System(CARS-08-A-1-3)Breeding Project of Shanxi Academy of Agricultural Sciences during the Thirteenth Five-Year Plan Period(16yzgc035)~~
文摘In order to reveal the genetic differences and agronomic traits of Fagopy-rum tataricum_ varieties (lines) intuitively, explore good resources and avoid the blindness of parent selection during the breeding process, six primary agronomic traits of 45 F. tataricum_ varieties (lines) that came from the eleven buckwheat breeding departments across the country were analyzed with principal component analysis and cluster analysis. The results of principal component analysis showed that the six agronomic traits could be simplified into three principal components, and the cumulative contribution rate reached 83%. The results of cluster analysis showed that the 45 F. tataricum varieties (lines) were classified into four groups:high stalk, medium yield and smal grain type, medium stalk, high yield and large grain type, medium stalk, low yield and smal grain type and high stalk, medium yield and medium grain type. Among them, performance of comprehensive trait of the second type was better than that of the other types. Thus, the F. tataricum_va-rieties (lines) that were classified into the second type could be considered as good varieties (lines) or breeding materials. The genetic differences among F. tataricum_varieties (lines) had no necessary correlations with origin and geographical distance. ln addition to complementary traits and geographical distance, genetic distances (dif-ferent populations) should be taken into consideration during parent selection in cross breeding.
基金Supported by Spark Program of Science and Technology Department of Shanxi Province(20130511021)~~
文摘In order to compare the characteristics of different varieties of sweet cherry and to formulate corresponding pruning scheme, hierarchical cluster analysis was conducted for the 14 sweet cherry varieties that were mainly planted in Shanxi Province. The results showed that the 14 varieties of sweet cherry could be divided into two types, Hongmanao and Rainier. Fruit setting rate, branching rate, medium fruit shoot proportion, spur proportion and yield per plant were significantly different between these two types of sweet cherry. The key points of pruning management, to improve the yield of Rainier type, were to increase the fruit setting rate and spur proportion, and to control properly the long and medium fruit shoot proportion.
基金Supported by National Natural Science Foundation of China(30960179)Natural Science Foundation of Yunnan Province(2007A048M)~~
文摘[Objective] This research aimed to study the FTIR spectra of corn germs and endosperms so as to provide a scientific way for identifying corn of different types. [Method] The corn germs and endosperms of three types were studied by using Fourier transform infrared spectroscopy(FTIR) technology, combined with cluster analysis. [Result] The overall characteristics of original FTIR spectra were basically similar within the range of 700-1 800 cm^-1. The FTIR spectra were mainly composed by the absorption peaks of polysaccharides, proteins and lipids. Within the wavelength range of 700-1 800 cm^-1, there were only tiny differences in original FTIR spectra among the corn germs and endosperms of three different types. The spectra were then processed by using first derivative and second derivative. The second derivative spectra were used for hierarchical cluster analysis(HCA). The results showed that with the wavelength range of 700-1 800 cm^-1, the second derivative spectra of the 52 samples could be better clustered according to the tree types and corn germ and corn endosperm. The clustering correct rate reached 96.1%.[Conclusion] FTIR technology, combined with cluster analysis, can be used to identify different types of corn germs and endosperms, and it is characterized by convenience and rapidness.
文摘According to the aggregation method of experts' evaluation information in group decision-making,the existing methods of determining experts' weights based on cluster analysis take into account the expert's preferences and the consistency of expert's collating vectors,but they lack of the measure of information similarity.So it may occur that although the collating vector is similar to the group consensus,information uncertainty is great of a certain expert.However,it is clustered to a larger group and given a high weight.For this,a new aggregation method based on entropy and cluster analysis in group decision-making process is provided,in which the collating vectors are classified with information similarity coefficient,and the experts' weights are determined according to the result of classification,the entropy of collating vectors and the judgment matrix consistency.Finally,a numerical example shows that the method is feasible and effective.
基金supported by the NationalBasic Research Program of China (Grant No. 2009CB421401)the Chinese Meteorological Administration Program (Grant No.GYHY200906009)
文摘Clustered heavy rains (CHRs) defined using hierarchical cluster analysis based on daily observations of precipitation in China during 1960-2008 are investi- gated in this paper. The geographical pattern of CHRs in China shows three high-frequency centers--South China, the Yangtze River basin, and part of North China around the Bohai Sea. CHRs occur most frequently in South China with a mean annual frequency of 6.8 (a total of 334 times during 1960-2008). June has the highest monthly frequency (2.2 times/month with a total of 108 times dur- ing 1960-2008), partly in association with the Meiyu phenomenon in the Yangtze River basin. Within the past 50 years, the frequency of CHRs in China has increased significantly from 13.5 to 17.3 times per year, which is approximately 28%. In the 1990s, the frequency of CHRs often reached 19.1 times per year. The geographical extent of CHR has expanded slightly by 0.5 stations, and its average daily rainfall intensity has increased by 3.7 mm d-1. The contribution of CHRs to total rainfall amount and the frequency of daily precipitation have increased by 63.1% and 22.7%, respectively, partly due to a significant decrease in light rains. In drying regions of North and Northeast China, the amounts of minimal CHRs have had no significant trend in recent years, probably due to warming in these arid regions enhancing atmospheric conveetivity at individual stations.
基金CAS Action-plan for West Development, KZCX2-XB2-06-03 National Natural Science Foundation of China, No.30500064
文摘The summer day-by-day precipitation data of 97 meteorological stations on the Qinghai-Tibet Plateau from 1961 to 2004 were selected to analyze the temporal-spatial distribution through accumulated variance,correlation analysis,regression analysis,empirical orthogonal function,power spectrum function and spatial analysis tools of GIS.The result showed that summer precipitation occupied a relatively high proportion in the area with less annual precipitation on the Plateau and the correlation between summer precipitation and annual precipitation was strong.The altitude of these stations and summer precipitation tendency presented stronger positive correlation below 2000 m,with correlation value up to 0.604(α=0.01).The subtracting tendency values between 1961-1983 and 1984-2004 at five altitude ranges(2000-2500 m,2500-3000 m,3500-4000 m,4000-4500 m and above 4500 m)were above zero and accounted for 71.4%of the total.Using empirical orthogonal function, summer precipitation could be roughly divided into three precipitation pattern fields:the Southeast Plateau Pattern Field,the Northeast Plateau Pattern field and the Three Rivers' Headstream Regions Pattern Field.The former two ones had a reverse value from the north to the south and opposite line was along 35°N.The potential cycles of the three pattern fields were 5.33a,21.33a and 2.17a respectively,tested by the confidence probability of 90%.The station altitudes and summer precipitation potential cycles presented strong negative correlation in the stations above 4500 m,with correlation value of-0.626(α=0.01).In Three Rivers Headstream Regions summer precipitation cycle decreased as the altitude rose in the stations above 3500 m and increased as the altitude rose in those below 3500 m.The empirical orthogonal function analysis in June precipitation,July precipitation and August precipitation showed that the June precipitation pattern field was similar to the July's,in which southern Plateau was positive and northern Plateau negative.But positive value area in July precipitation pattern field was obviously less than June's.The August pattern field was totally opposite to June's and July's.The positive area in August pattern field jumped from the southern Plateau to the northern Plateau.
基金supported by the National Science and Technology Support Program(Grant No.2006BAD01A01-5)the Key Program of the Development of Variety of Genetically Modified Organisms(Grant No.2008ZX08001-006)+2 种基金Special Program for Rice Scientific Research,Ministry of Agriculture,China(Grant No.nyhyzx 07-001-006)the Key Support Program of Jiangsu Science and Technology(Grant No.BE2008354)Jiangsu Self-innovation Fund for Agricultural Science and Technology,China(GrantNo.CX[08]603)
文摘Diversity of 60 conventional japonica rice accessions with good eating quality at home and abroad was analyzed using SSR molecular markers, agronomic traits and taste characteristics. A total of 290 alleles were detected in the 60 accessions at 72 SSR loci with the high similarity coefficients varying between 0.600 and 0.924. The loci on chromosome 5 showed the greatest value in average allele number. Additionally, most of the SSR loci could detect 3 to 4 alleles. An UPGMA dendrogram based on the cluster analysis of the genetic similarity coefficients showed that the grouping trend of part of the rice accessions was geographic-related and most of the rice accessions in Jiangsu Province, China were clustered together. Furthermore, many domestic accessions from south and north origins in China were close to the foreign japonica rice varieties, as proved by their pedigree origin from the foreign high-quality sources. For taste characteristics, part of the accessions with excellent taste were clearly clustered into one category though they came from different geographical regions, which indicates that taste characteristics of some varieties were mainly genetically determined. In addition, the agronomic traits of japonica rice with good taste might be closely related with their geographical origins, but the relationship between superior taste characteristics and agronomic traits should be further clarified.
基金supported by the Natural Science Foundation(No.71273153)National Key Technology Research and Development Program(No.2009BAC62B01)
文摘To meet China's CO2 intensity target of 40%-45% reduction by 2020 based on the 2005 level, a regional allocation method based on cluster analysis is developed. Thirty Chinese provinces are classified into six groups based on economy, emissions, and reduction potential indicators. Under the equity principle, the two most developed groups axe assigned the highest reduction targets (55% and 65%, respectively). However, their reduction potent!al is limited. Under the efficiency principle, the two groups with the highest reduction potential take the highest targets (48% and 61%, respectively), but their economy is relatively backward. When equity and efficiency are equally weighted, the 5th group with a prominent reduction potential takes the highest target (54%), and the 2nd and the 3rd groups with large industry scales take the second highest target (49%). However, under all the three allocation schemes, the targets are not greater than 40% for the 4th and the 6th groups, which have a relatively low economic ability, emissions, and reduction potential. Due to inconsistency between economic and reduction potential, corresponding market mechanisms and policy instruments should be established to ensure equity and efficiency of regional target allocation.
基金the National Natural Science Foundation of China under contract No.49976027 the Important Topic of Scientific Research of the State 0ceanic Administration, China, on the construction system of oil fingerprinting database and the key technology (from 2004 to 2005 ).
文摘By gas chromatogram, six crude oils fingerprinting distributed in four oilfields and four oil platforms were analyzed and the corre- sponding normal paraffin hydrocarbon ( including pristane and phytane) concentration was obtained by the internal standard methed. The normal paraffin hydrocarbon distribution patterns of six crude oils were built and compared. The cluster analysis on the normal paraffin hydrocarbon concentration was conducted for classification and some ratios of oils were used for oils comparison. The results indicated: there was a clear difference within different crude oils in different oil fields and a small difference between the crude oils in the same oil platform. The normal paraffin hydrocarbon distribution pattern and ratios, as well as the cluster analysis on the nomad paraffin hydrocarbon concentration can have a better differentiation result for the crude oils with small difference than the original gas chromatogram.
基金Sponsored by the Scientific and Technological Project on Road Maintenance Management Mode in Guangdong Province(Grant No.200407132)the Launching Fund Project for Dr.in Guangdong Province(Grant No.05300135)
文摘To quantitatively identify the maintenance demand for each highway segments in the pavement maintenance scheme design,a mathematical model of uniform segment division was established and an approach of applying cluster analysis theory to the uniform segment division and evaluation of pavement maintenance demand was proposed.The actual maintenance project of a highway carried out in Guangdong province was cited as an example to demonstrate the validity of the proposed method.It is proved that the cluster analysis can eliminate human factors in classification without being constrained by the quantities of samples,considering multiple pavement distress indexes and the continuity of samples.Thus it is evident that cluster analysis is an efficient analytical tool in uniform segment division and evaluation of maintenance demand.