The goal of this study was to optimize the constitutive parameters of foundation soils using a k-means algorithm with clustering analysis. A database was collected from unconfined compression tests, Proctor tests and ...The goal of this study was to optimize the constitutive parameters of foundation soils using a k-means algorithm with clustering analysis. A database was collected from unconfined compression tests, Proctor tests and grain distribution tests of soils taken from three different types of foundation pits: raft foundations, partial raft foundations and strip foundations. k-means algorithm with clustering analysis was applied to determine the most appropriate foundation type given the un- confined compression strengths and other parameters of the different soils.展开更多
Effective storage,processing and analyzing of power device condition monitoring data faces enormous challenges.A framework is proposed that can support both MapReduce and Graph for massive monitoring data analysis at ...Effective storage,processing and analyzing of power device condition monitoring data faces enormous challenges.A framework is proposed that can support both MapReduce and Graph for massive monitoring data analysis at the same time based on Aliyun DTplus platform.First,power device condition monitoring data storage based on MaxCompute table and parallel permutation entropy feature extraction based on MaxCompute MapReduce are designed and implemented on DTplus platform.Then,Graph based k-means algorithm is implemented and used for massive condition monitoring data clustering analysis.Finally,performance tests are performed to compare the execution time between serial program and parallel program.Performance is analyzed from CPU cores consumption,memory utilization and parallel granularity.Experimental results show that the designed framework and parallel algorithms can efficiently process massive power device condition monitoring data.展开更多
A novel multivariate similarity clustering analysis (MSCA) approach was used to estimate a biogeographical division scheme for the global terrestrial fauna and was compared against other widely used clustering algorit...A novel multivariate similarity clustering analysis (MSCA) approach was used to estimate a biogeographical division scheme for the global terrestrial fauna and was compared against other widely used clustering algorithms. The faunal dataset included almost all terrestrial and freshwater fauna, a total of 4631 families, 141,814 genera, and 1,334,834 species. Our findings demonstrated that suitable results were only obtained with the MSCA method, which was associated with distinct hierarchies, reasonable structuring, and furthermore, conformed to biogeographical criteria. A total of seven kingdoms and 20 sub-kingdoms were identified. We discovered that the clustering results for the higher and lower animals did not differ significantly, leading us to consider that the analysis result is convincing as the first zoogeographical division scheme for global all terrestrial animals.展开更多
Affected by many involved factors, different dimensions, data with large difference, incomplete information and so on, the most optimal selection of regional outburst prevention measures for outburst mine has become a...Affected by many involved factors, different dimensions, data with large difference, incomplete information and so on, the most optimal selection of regional outburst prevention measures for outburst mine has become a complicated system project. The traditional way of outburst prevention measure selection belongs to qualitative method, which may cause high-cost of gas control, huge quantities of drilling work, long construction time and even secondary disaster. To solve the above-mentioned problems, in light of occurrence status of coal seam gas in No. 21 mining area of Jinzhushan Tuzhu Mine, through grey fixed weight clustering theory and a combination method of qualitative and quantitative analysis, the judging model with multi-objective classification for optimization of outburst prevention measures was established. The three weight coefficients of outburst prevention technology scheme are sorted, in order to determine the advantages and disadvantages of each outburst prevention technology scheme under the comprehensive evaluation of multi-target. Finally, the problem of quantitative selection for regional outburst prevention technology scheme is solved under the situation of multi-factor mode and incomplete information, which provides reasonable and effective technical measures for prevention of coal and gas outburst disaster.展开更多
In this study,the world’s land(except Antarctica)is divided into 67 basic geographical units according to ecological types.Using our newly proposed MSCA(Multivariate Similarity Clustering Analysis)method,7,591 specie...In this study,the world’s land(except Antarctica)is divided into 67 basic geographical units according to ecological types.Using our newly proposed MSCA(Multivariate Similarity Clustering Analysis)method,7,591 species of modern terrestrial mammals belonging to 1,374 genera in 162 families and 2,378 species of mammals in the Wallace era before 1876 are quantitatively analyzed,and almost the same clustering results are obtained,with clear levels and reasonable clustering,which conform to the principles of geography,statistics,ecology and biology.It not only affirms and supports the reasonable kernel of Wallace’s scheme,but also puts forward suggestions that should be revised and improved.The large or small differences between the clustering results and the mammalian geographical zoning schemes of contemporary scholars are caused by different analysis methods,and they are highly consistent with the analysis results of chordates,angiosperms and insects in the world analyzed by the same method.Once again,it confirms the homogeneity of the global biological distribution pattern of major groups,and the possibility of building a unified biogeographic zoning system in the world.展开更多
In view of the composition analysis and identification of ancient glass products, L1 regularization, K-Means cluster analysis, elbow rule and other methods were comprehensively used to build logical regression, cluste...In view of the composition analysis and identification of ancient glass products, L1 regularization, K-Means cluster analysis, elbow rule and other methods were comprehensively used to build logical regression, cluster analysis, hyper-parameter test and other models, and SPSS, Python and other tools were used to obtain the classification rules of glass products under different fluxes, sub classification under different chemical compositions, hyper-parameter K value test and rationality analysis. Research can provide theoretical support for the protection and restoration of ancient glass relics.展开更多
In clustering algorithms,the selection of neighbors significantly affects the quality of the final clustering results.While various neighbor relationships exist,such as K-nearest neighbors,natural neighbors,and shared...In clustering algorithms,the selection of neighbors significantly affects the quality of the final clustering results.While various neighbor relationships exist,such as K-nearest neighbors,natural neighbors,and shared neighbors,most neighbor relationships can only handle single structural relationships,and the identification accuracy is low for datasets with multiple structures.In life,people’s first instinct for complex things is to divide them into multiple parts to complete.Partitioning the dataset into more sub-graphs is a good idea approach to identifying complex structures.Taking inspiration from this,we propose a novel neighbor method:Shared Natural Neighbors(SNaN).To demonstrate the superiority of this neighbor method,we propose a shared natural neighbors-based hierarchical clustering algorithm for discovering arbitrary-shaped clusters(HC-SNaN).Our algorithm excels in identifying both spherical clusters and manifold clusters.Tested on synthetic datasets and real-world datasets,HC-SNaN demonstrates significant advantages over existing clustering algorithms,particularly when dealing with datasets containing arbitrary shapes.展开更多
In this paper, CiteSpace, a bibliometrics software, was adopted to collect research papers published on the Web of Science, which are relevant to biological model and effluent quality prediction in activated sludge pr...In this paper, CiteSpace, a bibliometrics software, was adopted to collect research papers published on the Web of Science, which are relevant to biological model and effluent quality prediction in activated sludge process in the wastewater treatment. By the way of trend map, keyword knowledge map, and co-cited knowledge map, specific visualization analysis and identification of the authors, institutions and regions were concluded. Furthermore, the topics and hotspots of water quality prediction in activated sludge process through the literature-co-citation-based cluster analysis and literature citation burst analysis were also determined, which not only reflected the historical evolution progress to a certain extent, but also provided the direction and insight of the knowledge structure of water quality prediction and activated sludge process for future research.展开更多
A significant portion of Landslide Early Warning Systems (LEWS) relies on the definition of operational thresholds and the monitoring of cumulative rainfall for alert issuance. These thresholds can be obtained in vari...A significant portion of Landslide Early Warning Systems (LEWS) relies on the definition of operational thresholds and the monitoring of cumulative rainfall for alert issuance. These thresholds can be obtained in various ways, but most often they are based on previous landslide data. This approach introduces several limitations. For instance, there is a requirement for the location to have been previously monitored in some way to have this type of information recorded. Another significant limitation is the need for information regarding the location and timing of incidents. Despite the current ease of obtaining location information (GPS, drone images, etc.), the timing of the event remains challenging to ascertain for a considerable portion of landslide data. Concerning rainfall monitoring, there are multiple ways to consider it, for instance, examining accumulations over various intervals (1 h, 6 h, 24 h, 72 h), as well as in the calculation of effective rainfall, which represents the precipitation that actually infiltrates the soil. However, in the vast majority of cases, both the thresholds and the rain monitoring approach are defined manually and subjectively, relying on the operators’ experience. This makes the process labor-intensive and time-consuming, hindering the establishment of a truly standardized and rapidly scalable methodology on a large scale. In this work, we propose a Landslides Early Warning System (LEWS) based on the concept of rainfall half-life and the determination of thresholds using Cluster Analysis and data inversion. The system is designed to be applied in extensive monitoring networks, such as the one utilized by Cemaden, Brazil’s National Center for Monitoring and Early Warning of Natural Disasters.展开更多
In the past 30 years, Chinese enterprises have been a hot topic of discussion and concern among the general public in terms of economic and social status, ownership structure, business mechanism, and management level....In the past 30 years, Chinese enterprises have been a hot topic of discussion and concern among the general public in terms of economic and social status, ownership structure, business mechanism, and management level. Solving the problem of employment for the people is an important prerequisite for their peaceful living and work, as well as a prerequisite and foundation for building a harmonious society. The employment situation of private enterprises has always been of great concern to the outside world, and these two major jobs have always occupied an important position in the employment field of China that cannot be ignored. With the establishment of the market economy system, individual and private enterprises have become important components of the socialist economy, making significant contributions to economic development and social progress. The rapid development of China’s economy, on the one hand, is the embodiment of the superiority of China’s socialist market economic system, and on the other hand, it is the role of the tertiary industry and private enterprises in promoting the national economy. Since the 1990s, China’s private enterprises have become a new economic growth point for local and even national countries, and are one of the important ways to arrange employment and achieve social stability. This paper studies the employment of private enterprises and individuals from the perspective of statistics, extracts relevant data from China statistical Yearbook, uses the relevant knowledge of statistics to process the data, obtains the conclusion and puts forward relevant constructive suggestions.展开更多
With rapid developments in platforms and sensors technology in terms of digital cameras and video recordings,crowd monitoring has taken a considerable attentions in many disciplines such as psychology,sociology,engine...With rapid developments in platforms and sensors technology in terms of digital cameras and video recordings,crowd monitoring has taken a considerable attentions in many disciplines such as psychology,sociology,engineering,and computer vision.This is due to the fact that,monitoring of the crowd is necessary to enhance safety and controllable movements to minimize the risk particularly in highly crowded incidents(e.g.sports).One of the platforms that have been extensively employed in crowd monitoring is unmanned aerial vehicles(UAVs),because UAVs have the capability to acquiring fast,low costs,high-resolution and real-time images over crowd areas.In addition,geo-referenced images can also be provided through integration of on-board positioning sensors(e.g.GPS/IMU)with vision sensors(digital cameras and laser scanner).In this paper,a new testing procedure based on feature from accelerated segment test(FAST)algorithms is introduced to detect the crowd features from UAV images taken from different camera orientations and positions.The proposed test started with converting a circle of 16 pixels surrounding the center pixel into a vector and sorting it in ascending/descending order.A single pixel which takes the ranking number 9(for FAST-9)or 12(for FAST-12)was then compared with the center pixel.Accuracy assessment in terms of completeness and correctness was used to assess the performance of the new testing procedure before and after filtering the crowd features.The results show that the proposed algorithms are able to extract crowd features from different UAV images.Overall,the values of Completeness range from 55 to 70%whereas the range of correctness values was 91 to 94%.展开更多
The current scheme of building climate zones in China generally assumes that building climate zones of island cities are identical to adjacent land stations.Consequently,building design strategies for island buildings...The current scheme of building climate zones in China generally assumes that building climate zones of island cities are identical to adjacent land stations.Consequently,building design strategies for island buildings usually refer to those developed for inland cities.This approach has to some extent hindered the energy-saving design and green development of island buildings in China.This research takes a first step on this issue by defining the building climate zones of 36 marine islands over China marine area using two-stage zoning methodology adopted by current building climate zoning standard(GB50178-1993).The meteorological data used for analysis was obtained from the National Climate Center of China over the 30-year period from 1985 to 2014.As comparison,40 coastal stations which are adjacent to the inves-tigated marine islands were also included in this study.Subsequently a more obiective techni-que-cluster analysis was operated as an effective supplement to discover the climate characteristics among different observations.The results of both methodologies consistentlyshow that among the 36 islands investigated,the majority of islands located in northern and eastern marine area belong to the same climate zones as their adjacent coastal cities.Howev-er,island cities in southern marine area cannot be assigned to any current climate zone,which was demonstrated by its distinctive climate features different from any other sites investi-gated through cluster analysis as well as different energy use patterns.Thus a new zone was defined to supplement the current building climate zoning scheme to cover marine area of China.展开更多
Background:Brucellosis is a major public health issue in China,while its temporal and spatial distribution have not been studied in depth.This study aims to better understand the epidemiology of brucellosis in the mai...Background:Brucellosis is a major public health issue in China,while its temporal and spatial distribution have not been studied in depth.This study aims to better understand the epidemiology of brucellosis in the mainland of China,by investigating the human,temporal and spatial distribution and clustering characteristics of the disease.Methods:Human brucellosis data from the mainland of China between 2012 and 2016 were obtained from the China Information System for Disease Control and Prevention.The spatial autocorrelation analysis of ArcGIS10.6 and the spatial-temporal scanning analysis of SaTScan software were used to identify potential changes in the spatial and temporal distribution of human brucellosis in the mainland of China during the study period.Results:A total of 244348 human brucellosis cases were reported during the study period of 2012-2016.The average incidence of human brucellosis was higher in the 40-65 age group.The temporal clustering analysis showed that the high incidence of brucellosis occurred between March and July.The spatial clustering analysis showed that the location of brucellosis clustering in the mainland of China remained relatively fixed,mainly concentrated in most parts of northern China.The results of the spatial-temporal clustering analysis showed that Heilongjiang represents a primary clustering area,and the Tibet,Shanxi and Hubei provinces represent three secondary clustering areas.Conclusions:Human brucellosis remains a widespread challenge,particularly in northern China.The clustering analysis highlights potential high-risk human groups,time frames and areas,which may require special plans and resources to monitor and control the disease.展开更多
An evaluation index is a prerequisite for the scientific evaluation of a public meteorological service.This paper aims to explore a technical method for determining and screening evaluation indicators.Based on public ...An evaluation index is a prerequisite for the scientific evaluation of a public meteorological service.This paper aims to explore a technical method for determining and screening evaluation indicators.Based on public satisfaction survey data obtained in Wafangdian,China in 2010,this study investigates the suitability of fuzzy clustering analysis method in establishing an evaluation index.Through quantitative analysis of multilayer fuzzy clustering of various evaluation indicators,correlation analysis indicates that if the results of clustering were identical for two evaluation indicators in the same sub-evaluation layer,then one indicator could be removed,or the two indicators merged.For evaluation indicators in different sub-evaluation layers,although clustering reveals attribute correlations,these indicators may not be substituted for one another.Analysis of the applicability of the fuzzy clustering method shows that it plays a certain role in the establishment and correction of an evaluation index.展开更多
With the rapid development of technology,processing the explosive growth of meteorological data on traditional standalone computing has become increasingly time-consuming,which cannot meet the demands of scientific re...With the rapid development of technology,processing the explosive growth of meteorological data on traditional standalone computing has become increasingly time-consuming,which cannot meet the demands of scientific research and business.Therefore,this paper proposes the implementation of the parallel Clustering Large Application based upon RANdomized Search(CLARANS)clustering algorithm on the Spark cloud computing platformto cluster China’s climate regions usingmeteorological data from1988 to 2018.The aim is to address the challenge of applying clustering algorithms to large datasets.In this paper,the morphological similarity distance is adopted as the similarity measurement standard instead of Euclidean distance,which improves clustering accuracy.Furthermore,the issue of local optima caused by an improper selection of initial clustering centers is addressed by utilizing the max-distance criterion.Compared to the k-means clustering algorithm already implemented in the Spark platform,the proposed algorithm has strong robustness,can reduce the interference of outliers in the dataset on clustering results,and has higher parallel performance than the frequently used serial algorithms,thus improving the efficiency of big data analysis.This experiment compares the clustered centroid data with the annual average meteorological data of representative cities in the five typical meteorological regions that exist in China,and the results show that the clustering results are in good agreement with the meteorological data obtained from the National Meteorological Science Data Center.This algorithm has a positive effect on the clustering analysis of massive meteorological data and deserves attention in scientific research activities.展开更多
A total of 10 indices of regional economic development in Guangxi are selected.According to the relevant economic data,regional economic development in Guangxi is analyzed by using System Clustering Method and Princip...A total of 10 indices of regional economic development in Guangxi are selected.According to the relevant economic data,regional economic development in Guangxi is analyzed by using System Clustering Method and Principal Component Analysis Method.Result shows that System Clustering Method and Principal Component Analysis Method have revealed similar results analysis of economic development level.Overall economic strength of Guangxi is weak and Nanning has relatively high scores of factors due to its advantage of the political,economic and cultural center.Comprehensive scores of other regions are all lower than 1,which has big gap with the development of Nanning.Overall development strategy points out that Guangxi should accelerate the construction of the Ring Northern Bay Economic Zone,create a strong logistics system having strategic significance to national development,use the unique location advantage and rely on the modern transportation system to establish a logistics center and business center connecting the hinterland and the Asean Market.Based on the problems of unbalanced regional economic development in Guangxi,we should speed up the development of service industry in Nanning,construct the circular economy system of industrial city,and accelerate the industrialization process of tourism city in order to realize balanced development of regional economy in Guangxi,China.展开更多
In this paper,we report upon our recent work aimed at improving and adapting machine learning algorithms to automatically classify nanoscience images acquired by the Scanning Electron Microscope(SEM).This is done by c...In this paper,we report upon our recent work aimed at improving and adapting machine learning algorithms to automatically classify nanoscience images acquired by the Scanning Electron Microscope(SEM).This is done by coupling supervised and unsupervised learning approaches.We first investigate supervised learning on a ten-category data set of images and compare the performance of the different models in terms of training accuracy.Then,we reduce the dimensionality of the features through autoencoders to perform unsupervised learning on a subset of images in a selected range of scales(from 1μm to 2μm).Finally,we compare different clustering methods to uncover intrinsic structures in the images.展开更多
The scientific and fair positioning of monitoring locations for surface displacement on slopes is a prerequisite for early warning and forecasting.However,there is no specific provision on how to effectively determine...The scientific and fair positioning of monitoring locations for surface displacement on slopes is a prerequisite for early warning and forecasting.However,there is no specific provision on how to effectively determine the number and location of monitoring points according to the actual deformation characteristics of the slope.There are still some defects in the layout of monitoring points.To this end,based on displacement data series and spatial location information of surface displacement monitoring points,by combining displacement series correlation and spatial distance influence factors,a spatial deformation correlation calculation model of slope based on clustering analysis was proposed to calculate the correlation between different monitoring points,based on which the deformation area of the slope was divided.The redundant monitoring points in each partition were eliminated based on the partition's outcome,and the overall optimal arrangement of slope monitoring points was then achieved.This method scientifically addresses the issues of slope deformation zoning and data gathering overlap.It not only eliminates human subjectivity from slope deformation zoning but also increases the efficiency and accuracy of slope monitoring.In order to verify the effectiveness of the method,a sand-mudstone interbedded CounterTilt excavation slope in the Chongqing city of China was used as the research object.Twenty-four monitoring points deployed on this slope were monitored for surface displacement for 13 months.The spatial location of the monitoring points was discussed.The results show that the proposed method of slope deformation zoning and the optimized placement of monitoring points are feasible.展开更多
A survey on bubble clustering in air–water flow processes may provide significant insights into turbulent two-phaseflow.These processes have been studied in plunging jets,dropshafts,and hydraulic jumps on a smooth bed....A survey on bubble clustering in air–water flow processes may provide significant insights into turbulent two-phaseflow.These processes have been studied in plunging jets,dropshafts,and hydraulic jumps on a smooth bed.As a first attempt,this study examined the bubble clustering process in hydraulic jumps on a pebbled rough bed using experimental data for 1.70<Fr_(1)<2.84(with Fr_(1) denoting the inflow Froude number).The basic properties of particle grouping and clustering,including the number of clusters,the dimensionless number of clusters per second,the percentage of clustered bubbles,and the number of bubbles per cluster,were analyzed based on two criteria.For both criteria,the maximum cluster count rate was greater on the rough bed than on the smooth bed,suggesting greater interactions between turbulence and bubbly flow on the rough bed.The results were consistent with the longitudinal distribution of the interfacial velocity using one of the criteria.In addition,the clustering process was analyzed using a different approach:the interparticle arrival time of bubbles.The comparison showed that the bubbly flow structure had a greater density of bubbles per unitflux on the rough bed than on the smooth bed.Bed roughness was the dominant parameter close to the jump toe.Further downstream,Fr_(1) predominated.Thus,the rate of bubble density decreased more rapidly for the hydraulic jump with the lowest Fr_(1).展开更多
Various types of plasma events emerge in specific parameter ranges and exhibit similar characteristics in diagnostic signals,which can be applied to identify these events.A semisupervised machine learning algorithm,th...Various types of plasma events emerge in specific parameter ranges and exhibit similar characteristics in diagnostic signals,which can be applied to identify these events.A semisupervised machine learning algorithm,the k-means clustering algorithm,is utilized to investigate and identify plasma events in the J-TEXT plasma.This method can cluster diverse plasma events with homogeneous features,and then these events can be identified if given few manually labeled examples based on physical understanding.A survey of clustered events reveals that the k-means algorithm can make plasma events(rotating tearing mode,sawtooth oscillations,and locked mode)gathering in Euclidean space composed of multi-dimensional diagnostic data,like soft x-ray emission intensity,edge toroidal rotation velocity,the Mirnov signal amplitude and so on.Based on the cluster analysis results,an approximate analytical model is proposed to rapidly identify plasma events in the J-TEXT plasma.The cluster analysis method is conducive to data markers of massive diagnostic data.展开更多
文摘The goal of this study was to optimize the constitutive parameters of foundation soils using a k-means algorithm with clustering analysis. A database was collected from unconfined compression tests, Proctor tests and grain distribution tests of soils taken from three different types of foundation pits: raft foundations, partial raft foundations and strip foundations. k-means algorithm with clustering analysis was applied to determine the most appropriate foundation type given the un- confined compression strengths and other parameters of the different soils.
基金This work has been supported by.Central University Research Fund(No.2016MS116,No.2016MS117,No.2018MS074)the National Natural Science Foundation(51677072).
文摘Effective storage,processing and analyzing of power device condition monitoring data faces enormous challenges.A framework is proposed that can support both MapReduce and Graph for massive monitoring data analysis at the same time based on Aliyun DTplus platform.First,power device condition monitoring data storage based on MaxCompute table and parallel permutation entropy feature extraction based on MaxCompute MapReduce are designed and implemented on DTplus platform.Then,Graph based k-means algorithm is implemented and used for massive condition monitoring data clustering analysis.Finally,performance tests are performed to compare the execution time between serial program and parallel program.Performance is analyzed from CPU cores consumption,memory utilization and parallel granularity.Experimental results show that the designed framework and parallel algorithms can efficiently process massive power device condition monitoring data.
文摘A novel multivariate similarity clustering analysis (MSCA) approach was used to estimate a biogeographical division scheme for the global terrestrial fauna and was compared against other widely used clustering algorithms. The faunal dataset included almost all terrestrial and freshwater fauna, a total of 4631 families, 141,814 genera, and 1,334,834 species. Our findings demonstrated that suitable results were only obtained with the MSCA method, which was associated with distinct hierarchies, reasonable structuring, and furthermore, conformed to biogeographical criteria. A total of seven kingdoms and 20 sub-kingdoms were identified. We discovered that the clustering results for the higher and lower animals did not differ significantly, leading us to consider that the analysis result is convincing as the first zoogeographical division scheme for global all terrestrial animals.
文摘Affected by many involved factors, different dimensions, data with large difference, incomplete information and so on, the most optimal selection of regional outburst prevention measures for outburst mine has become a complicated system project. The traditional way of outburst prevention measure selection belongs to qualitative method, which may cause high-cost of gas control, huge quantities of drilling work, long construction time and even secondary disaster. To solve the above-mentioned problems, in light of occurrence status of coal seam gas in No. 21 mining area of Jinzhushan Tuzhu Mine, through grey fixed weight clustering theory and a combination method of qualitative and quantitative analysis, the judging model with multi-objective classification for optimization of outburst prevention measures was established. The three weight coefficients of outburst prevention technology scheme are sorted, in order to determine the advantages and disadvantages of each outburst prevention technology scheme under the comprehensive evaluation of multi-target. Finally, the problem of quantitative selection for regional outburst prevention technology scheme is solved under the situation of multi-factor mode and incomplete information, which provides reasonable and effective technical measures for prevention of coal and gas outburst disaster.
基金supported by the key laboratory foundation of Henna(112300413221).
文摘In this study,the world’s land(except Antarctica)is divided into 67 basic geographical units according to ecological types.Using our newly proposed MSCA(Multivariate Similarity Clustering Analysis)method,7,591 species of modern terrestrial mammals belonging to 1,374 genera in 162 families and 2,378 species of mammals in the Wallace era before 1876 are quantitatively analyzed,and almost the same clustering results are obtained,with clear levels and reasonable clustering,which conform to the principles of geography,statistics,ecology and biology.It not only affirms and supports the reasonable kernel of Wallace’s scheme,but also puts forward suggestions that should be revised and improved.The large or small differences between the clustering results and the mammalian geographical zoning schemes of contemporary scholars are caused by different analysis methods,and they are highly consistent with the analysis results of chordates,angiosperms and insects in the world analyzed by the same method.Once again,it confirms the homogeneity of the global biological distribution pattern of major groups,and the possibility of building a unified biogeographic zoning system in the world.
文摘In view of the composition analysis and identification of ancient glass products, L1 regularization, K-Means cluster analysis, elbow rule and other methods were comprehensively used to build logical regression, cluster analysis, hyper-parameter test and other models, and SPSS, Python and other tools were used to obtain the classification rules of glass products under different fluxes, sub classification under different chemical compositions, hyper-parameter K value test and rationality analysis. Research can provide theoretical support for the protection and restoration of ancient glass relics.
基金This work was supported by Science and Technology Research Program of Chongqing Municipal Education Commission(KJZD-M202300502,KJQN201800539).
文摘In clustering algorithms,the selection of neighbors significantly affects the quality of the final clustering results.While various neighbor relationships exist,such as K-nearest neighbors,natural neighbors,and shared neighbors,most neighbor relationships can only handle single structural relationships,and the identification accuracy is low for datasets with multiple structures.In life,people’s first instinct for complex things is to divide them into multiple parts to complete.Partitioning the dataset into more sub-graphs is a good idea approach to identifying complex structures.Taking inspiration from this,we propose a novel neighbor method:Shared Natural Neighbors(SNaN).To demonstrate the superiority of this neighbor method,we propose a shared natural neighbors-based hierarchical clustering algorithm for discovering arbitrary-shaped clusters(HC-SNaN).Our algorithm excels in identifying both spherical clusters and manifold clusters.Tested on synthetic datasets and real-world datasets,HC-SNaN demonstrates significant advantages over existing clustering algorithms,particularly when dealing with datasets containing arbitrary shapes.
文摘In this paper, CiteSpace, a bibliometrics software, was adopted to collect research papers published on the Web of Science, which are relevant to biological model and effluent quality prediction in activated sludge process in the wastewater treatment. By the way of trend map, keyword knowledge map, and co-cited knowledge map, specific visualization analysis and identification of the authors, institutions and regions were concluded. Furthermore, the topics and hotspots of water quality prediction in activated sludge process through the literature-co-citation-based cluster analysis and literature citation burst analysis were also determined, which not only reflected the historical evolution progress to a certain extent, but also provided the direction and insight of the knowledge structure of water quality prediction and activated sludge process for future research.
文摘A significant portion of Landslide Early Warning Systems (LEWS) relies on the definition of operational thresholds and the monitoring of cumulative rainfall for alert issuance. These thresholds can be obtained in various ways, but most often they are based on previous landslide data. This approach introduces several limitations. For instance, there is a requirement for the location to have been previously monitored in some way to have this type of information recorded. Another significant limitation is the need for information regarding the location and timing of incidents. Despite the current ease of obtaining location information (GPS, drone images, etc.), the timing of the event remains challenging to ascertain for a considerable portion of landslide data. Concerning rainfall monitoring, there are multiple ways to consider it, for instance, examining accumulations over various intervals (1 h, 6 h, 24 h, 72 h), as well as in the calculation of effective rainfall, which represents the precipitation that actually infiltrates the soil. However, in the vast majority of cases, both the thresholds and the rain monitoring approach are defined manually and subjectively, relying on the operators’ experience. This makes the process labor-intensive and time-consuming, hindering the establishment of a truly standardized and rapidly scalable methodology on a large scale. In this work, we propose a Landslides Early Warning System (LEWS) based on the concept of rainfall half-life and the determination of thresholds using Cluster Analysis and data inversion. The system is designed to be applied in extensive monitoring networks, such as the one utilized by Cemaden, Brazil’s National Center for Monitoring and Early Warning of Natural Disasters.
文摘In the past 30 years, Chinese enterprises have been a hot topic of discussion and concern among the general public in terms of economic and social status, ownership structure, business mechanism, and management level. Solving the problem of employment for the people is an important prerequisite for their peaceful living and work, as well as a prerequisite and foundation for building a harmonious society. The employment situation of private enterprises has always been of great concern to the outside world, and these two major jobs have always occupied an important position in the employment field of China that cannot be ignored. With the establishment of the market economy system, individual and private enterprises have become important components of the socialist economy, making significant contributions to economic development and social progress. The rapid development of China’s economy, on the one hand, is the embodiment of the superiority of China’s socialist market economic system, and on the other hand, it is the role of the tertiary industry and private enterprises in promoting the national economy. Since the 1990s, China’s private enterprises have become a new economic growth point for local and even national countries, and are one of the important ways to arrange employment and achieve social stability. This paper studies the employment of private enterprises and individuals from the perspective of statistics, extracts relevant data from China statistical Yearbook, uses the relevant knowledge of statistics to process the data, obtains the conclusion and puts forward relevant constructive suggestions.
文摘With rapid developments in platforms and sensors technology in terms of digital cameras and video recordings,crowd monitoring has taken a considerable attentions in many disciplines such as psychology,sociology,engineering,and computer vision.This is due to the fact that,monitoring of the crowd is necessary to enhance safety and controllable movements to minimize the risk particularly in highly crowded incidents(e.g.sports).One of the platforms that have been extensively employed in crowd monitoring is unmanned aerial vehicles(UAVs),because UAVs have the capability to acquiring fast,low costs,high-resolution and real-time images over crowd areas.In addition,geo-referenced images can also be provided through integration of on-board positioning sensors(e.g.GPS/IMU)with vision sensors(digital cameras and laser scanner).In this paper,a new testing procedure based on feature from accelerated segment test(FAST)algorithms is introduced to detect the crowd features from UAV images taken from different camera orientations and positions.The proposed test started with converting a circle of 16 pixels surrounding the center pixel into a vector and sorting it in ascending/descending order.A single pixel which takes the ranking number 9(for FAST-9)or 12(for FAST-12)was then compared with the center pixel.Accuracy assessment in terms of completeness and correctness was used to assess the performance of the new testing procedure before and after filtering the crowd features.The results show that the proposed algorithms are able to extract crowd features from different UAV images.Overall,the values of Completeness range from 55 to 70%whereas the range of correctness values was 91 to 94%.
基金This work was supported by Key Program of National Natural Science Foundation of China(No.51838011)National Key Research and Development Program of China(Project No.2018YFC0704505)the Rixin Talent Program granted by Beijing University of Technology.
文摘The current scheme of building climate zones in China generally assumes that building climate zones of island cities are identical to adjacent land stations.Consequently,building design strategies for island buildings usually refer to those developed for inland cities.This approach has to some extent hindered the energy-saving design and green development of island buildings in China.This research takes a first step on this issue by defining the building climate zones of 36 marine islands over China marine area using two-stage zoning methodology adopted by current building climate zoning standard(GB50178-1993).The meteorological data used for analysis was obtained from the National Climate Center of China over the 30-year period from 1985 to 2014.As comparison,40 coastal stations which are adjacent to the inves-tigated marine islands were also included in this study.Subsequently a more obiective techni-que-cluster analysis was operated as an effective supplement to discover the climate characteristics among different observations.The results of both methodologies consistentlyshow that among the 36 islands investigated,the majority of islands located in northern and eastern marine area belong to the same climate zones as their adjacent coastal cities.Howev-er,island cities in southern marine area cannot be assigned to any current climate zone,which was demonstrated by its distinctive climate features different from any other sites investi-gated through cluster analysis as well as different energy use patterns.Thus a new zone was defined to supplement the current building climate zoning scheme to cover marine area of China.
文摘Background:Brucellosis is a major public health issue in China,while its temporal and spatial distribution have not been studied in depth.This study aims to better understand the epidemiology of brucellosis in the mainland of China,by investigating the human,temporal and spatial distribution and clustering characteristics of the disease.Methods:Human brucellosis data from the mainland of China between 2012 and 2016 were obtained from the China Information System for Disease Control and Prevention.The spatial autocorrelation analysis of ArcGIS10.6 and the spatial-temporal scanning analysis of SaTScan software were used to identify potential changes in the spatial and temporal distribution of human brucellosis in the mainland of China during the study period.Results:A total of 244348 human brucellosis cases were reported during the study period of 2012-2016.The average incidence of human brucellosis was higher in the 40-65 age group.The temporal clustering analysis showed that the high incidence of brucellosis occurred between March and July.The spatial clustering analysis showed that the location of brucellosis clustering in the mainland of China remained relatively fixed,mainly concentrated in most parts of northern China.The results of the spatial-temporal clustering analysis showed that Heilongjiang represents a primary clustering area,and the Tibet,Shanxi and Hubei provinces represent three secondary clustering areas.Conclusions:Human brucellosis remains a widespread challenge,particularly in northern China.The clustering analysis highlights potential high-risk human groups,time frames and areas,which may require special plans and resources to monitor and control the disease.
基金National Science Foundation of China(91637105,41775048 and 41475041)National Key R&D Program of China(2018YFC1507800)Research on Tourism Traffic Meteorological Service Products in Heilongjiang Province(HQZD2017004)
文摘An evaluation index is a prerequisite for the scientific evaluation of a public meteorological service.This paper aims to explore a technical method for determining and screening evaluation indicators.Based on public satisfaction survey data obtained in Wafangdian,China in 2010,this study investigates the suitability of fuzzy clustering analysis method in establishing an evaluation index.Through quantitative analysis of multilayer fuzzy clustering of various evaluation indicators,correlation analysis indicates that if the results of clustering were identical for two evaluation indicators in the same sub-evaluation layer,then one indicator could be removed,or the two indicators merged.For evaluation indicators in different sub-evaluation layers,although clustering reveals attribute correlations,these indicators may not be substituted for one another.Analysis of the applicability of the fuzzy clustering method shows that it plays a certain role in the establishment and correction of an evaluation index.
基金supported by the National Natural Science Foundation of China(Grant No.62101275 and 62101274).
文摘With the rapid development of technology,processing the explosive growth of meteorological data on traditional standalone computing has become increasingly time-consuming,which cannot meet the demands of scientific research and business.Therefore,this paper proposes the implementation of the parallel Clustering Large Application based upon RANdomized Search(CLARANS)clustering algorithm on the Spark cloud computing platformto cluster China’s climate regions usingmeteorological data from1988 to 2018.The aim is to address the challenge of applying clustering algorithms to large datasets.In this paper,the morphological similarity distance is adopted as the similarity measurement standard instead of Euclidean distance,which improves clustering accuracy.Furthermore,the issue of local optima caused by an improper selection of initial clustering centers is addressed by utilizing the max-distance criterion.Compared to the k-means clustering algorithm already implemented in the Spark platform,the proposed algorithm has strong robustness,can reduce the interference of outliers in the dataset on clustering results,and has higher parallel performance than the frequently used serial algorithms,thus improving the efficiency of big data analysis.This experiment compares the clustered centroid data with the annual average meteorological data of representative cities in the five typical meteorological regions that exist in China,and the results show that the clustering results are in good agreement with the meteorological data obtained from the National Meteorological Science Data Center.This algorithm has a positive effect on the clustering analysis of massive meteorological data and deserves attention in scientific research activities.
文摘A total of 10 indices of regional economic development in Guangxi are selected.According to the relevant economic data,regional economic development in Guangxi is analyzed by using System Clustering Method and Principal Component Analysis Method.Result shows that System Clustering Method and Principal Component Analysis Method have revealed similar results analysis of economic development level.Overall economic strength of Guangxi is weak and Nanning has relatively high scores of factors due to its advantage of the political,economic and cultural center.Comprehensive scores of other regions are all lower than 1,which has big gap with the development of Nanning.Overall development strategy points out that Guangxi should accelerate the construction of the Ring Northern Bay Economic Zone,create a strong logistics system having strategic significance to national development,use the unique location advantage and rely on the modern transportation system to establish a logistics center and business center connecting the hinterland and the Asean Market.Based on the problems of unbalanced regional economic development in Guangxi,we should speed up the development of service industry in Nanning,construct the circular economy system of industrial city,and accelerate the industrialization process of tourism city in order to realize balanced development of regional economy in Guangxi,China.
基金This work has been done within the NFFA-EUROPE project and has received funding from the European Union’s Horizon 2020 Research and Innovation Program under grant agreement No.654360 NFFA-EUROPE.
文摘In this paper,we report upon our recent work aimed at improving and adapting machine learning algorithms to automatically classify nanoscience images acquired by the Scanning Electron Microscope(SEM).This is done by coupling supervised and unsupervised learning approaches.We first investigate supervised learning on a ten-category data set of images and compare the performance of the different models in terms of training accuracy.Then,we reduce the dimensionality of the features through autoencoders to perform unsupervised learning on a subset of images in a selected range of scales(from 1μm to 2μm).Finally,we compare different clustering methods to uncover intrinsic structures in the images.
基金funding from the National Natural Science Foundation of China(No.41572308)。
文摘The scientific and fair positioning of monitoring locations for surface displacement on slopes is a prerequisite for early warning and forecasting.However,there is no specific provision on how to effectively determine the number and location of monitoring points according to the actual deformation characteristics of the slope.There are still some defects in the layout of monitoring points.To this end,based on displacement data series and spatial location information of surface displacement monitoring points,by combining displacement series correlation and spatial distance influence factors,a spatial deformation correlation calculation model of slope based on clustering analysis was proposed to calculate the correlation between different monitoring points,based on which the deformation area of the slope was divided.The redundant monitoring points in each partition were eliminated based on the partition's outcome,and the overall optimal arrangement of slope monitoring points was then achieved.This method scientifically addresses the issues of slope deformation zoning and data gathering overlap.It not only eliminates human subjectivity from slope deformation zoning but also increases the efficiency and accuracy of slope monitoring.In order to verify the effectiveness of the method,a sand-mudstone interbedded CounterTilt excavation slope in the Chongqing city of China was used as the research object.Twenty-four monitoring points deployed on this slope were monitored for surface displacement for 13 months.The spatial location of the monitoring points was discussed.The results show that the proposed method of slope deformation zoning and the optimized placement of monitoring points are feasible.
文摘A survey on bubble clustering in air–water flow processes may provide significant insights into turbulent two-phaseflow.These processes have been studied in plunging jets,dropshafts,and hydraulic jumps on a smooth bed.As a first attempt,this study examined the bubble clustering process in hydraulic jumps on a pebbled rough bed using experimental data for 1.70<Fr_(1)<2.84(with Fr_(1) denoting the inflow Froude number).The basic properties of particle grouping and clustering,including the number of clusters,the dimensionless number of clusters per second,the percentage of clustered bubbles,and the number of bubbles per cluster,were analyzed based on two criteria.For both criteria,the maximum cluster count rate was greater on the rough bed than on the smooth bed,suggesting greater interactions between turbulence and bubbly flow on the rough bed.The results were consistent with the longitudinal distribution of the interfacial velocity using one of the criteria.In addition,the clustering process was analyzed using a different approach:the interparticle arrival time of bubbles.The comparison showed that the bubbly flow structure had a greater density of bubbles per unitflux on the rough bed than on the smooth bed.Bed roughness was the dominant parameter close to the jump toe.Further downstream,Fr_(1) predominated.Thus,the rate of bubble density decreased more rapidly for the hydraulic jump with the lowest Fr_(1).
基金supported by the National Magnetic Confinement Fusion Science Program of China(Nos.2018YFE0301104 and 2018YFE0301100)National Natural Science Foundation of China(Nos.12075096 and 51821005)。
文摘Various types of plasma events emerge in specific parameter ranges and exhibit similar characteristics in diagnostic signals,which can be applied to identify these events.A semisupervised machine learning algorithm,the k-means clustering algorithm,is utilized to investigate and identify plasma events in the J-TEXT plasma.This method can cluster diverse plasma events with homogeneous features,and then these events can be identified if given few manually labeled examples based on physical understanding.A survey of clustered events reveals that the k-means algorithm can make plasma events(rotating tearing mode,sawtooth oscillations,and locked mode)gathering in Euclidean space composed of multi-dimensional diagnostic data,like soft x-ray emission intensity,edge toroidal rotation velocity,the Mirnov signal amplitude and so on.Based on the cluster analysis results,an approximate analytical model is proposed to rapidly identify plasma events in the J-TEXT plasma.The cluster analysis method is conducive to data markers of massive diagnostic data.