In clustering algorithms,the selection of neighbors significantly affects the quality of the final clustering results.While various neighbor relationships exist,such as K-nearest neighbors,natural neighbors,and shared...In clustering algorithms,the selection of neighbors significantly affects the quality of the final clustering results.While various neighbor relationships exist,such as K-nearest neighbors,natural neighbors,and shared neighbors,most neighbor relationships can only handle single structural relationships,and the identification accuracy is low for datasets with multiple structures.In life,people’s first instinct for complex things is to divide them into multiple parts to complete.Partitioning the dataset into more sub-graphs is a good idea approach to identifying complex structures.Taking inspiration from this,we propose a novel neighbor method:Shared Natural Neighbors(SNaN).To demonstrate the superiority of this neighbor method,we propose a shared natural neighbors-based hierarchical clustering algorithm for discovering arbitrary-shaped clusters(HC-SNaN).Our algorithm excels in identifying both spherical clusters and manifold clusters.Tested on synthetic datasets and real-world datasets,HC-SNaN demonstrates significant advantages over existing clustering algorithms,particularly when dealing with datasets containing arbitrary shapes.展开更多
Efficient iterative unsupervised machine learning involving probabilistic clustering analysis with the expectation-maximization(EM)clustering algorithm is applied to categorize reservoir facies by exploiting latent an...Efficient iterative unsupervised machine learning involving probabilistic clustering analysis with the expectation-maximization(EM)clustering algorithm is applied to categorize reservoir facies by exploiting latent and observable well-log variables from a clastic reservoir in the Majnoon oilfield,southern Iraq.The observable well-log variables consist of conventional open-hole,well-log data and the computer-processed interpretation of gamma rays,bulk density,neutron porosity,compressional sonic,deep resistivity,shale volume,total porosity,and water saturation,from three wells located in the Nahr Umr reservoir.The latent variables include shale volume and water saturation.The EM algorithm efficiently characterizes electrofacies through iterative machine learning to identify the local maximum likelihood estimates(MLE)of the observable and latent variables in the studied dataset.The optimized EM model developed successfully predicts the core-derived facies classification in two of the studied wells.The EM model clusters the data into three distinctive reservoir electrofacies(F1,F2,and F3).F1 represents a gas-bearing electrofacies with low shale volume(Vsh)and water saturation(Sw)and high porosity and permeability values identifying it as an attractive reservoir target.The results of the EM model are validated using nuclear magnetic resonance(NMR)data from the third studied well for which no cores were recovered.The NMR results confirm the effectiveness and accuracy of the EM model in predicting electrofacies.The utilization of the EM algorithm for electrofacies classification/cluster analysis is innovative.Specifically,the clusters it establishes are less rigidly constrained than those derived from the more commonly used K-means clustering method.The EM methodology developed generates dependable electrofacies estimates in the studied reservoir intervals where core samples are not available.Therefore,once calibrated with core data in some wells,the model is suitable for application to other wells that lack core data.展开更多
In this paper, CiteSpace, a bibliometrics software, was adopted to collect research papers published on the Web of Science, which are relevant to biological model and effluent quality prediction in activated sludge pr...In this paper, CiteSpace, a bibliometrics software, was adopted to collect research papers published on the Web of Science, which are relevant to biological model and effluent quality prediction in activated sludge process in the wastewater treatment. By the way of trend map, keyword knowledge map, and co-cited knowledge map, specific visualization analysis and identification of the authors, institutions and regions were concluded. Furthermore, the topics and hotspots of water quality prediction in activated sludge process through the literature-co-citation-based cluster analysis and literature citation burst analysis were also determined, which not only reflected the historical evolution progress to a certain extent, but also provided the direction and insight of the knowledge structure of water quality prediction and activated sludge process for future research.展开更多
The recent pandemic crisis has highlighted the importance of the availability and management of health data to respond quickly and effectively to health emergencies, while respecting the fundamental rights of every in...The recent pandemic crisis has highlighted the importance of the availability and management of health data to respond quickly and effectively to health emergencies, while respecting the fundamental rights of every individual. In this context, it is essential to find a balance between the protection of privacy and the safeguarding of public health, using tools that guarantee transparency and consent to the processing of data by the population. This work, starting from a pilot investigation conducted in the Polyclinic of Bari as part of the Horizon Europe Seeds project entitled “Multidisciplinary analysis of technological tracing models of contagion: the protection of rights in the management of health data”, has the objective of promoting greater patient awareness regarding the processing of their health data and the protection of privacy. The methodology used the PHICAT (Personal Health Information Competence Assessment Tool) as a tool and, through the administration of a questionnaire, the aim was to evaluate the patients’ ability to express their consent to the release and processing of health data. The results that emerged were analyzed in relation to the 4 domains in which the process is divided which allows evaluating the patients’ ability to express a conscious choice and, also, in relation to the socio-demographic and clinical characteristics of the patients themselves. This study can contribute to understanding patients’ ability to give their consent and improve information regarding the management of health data by increasing confidence in granting the use of their data for research and clinical management.展开更多
In view of the composition analysis and identification of ancient glass products, L1 regularization, K-Means cluster analysis, elbow rule and other methods were comprehensively used to build logical regression, cluste...In view of the composition analysis and identification of ancient glass products, L1 regularization, K-Means cluster analysis, elbow rule and other methods were comprehensively used to build logical regression, cluster analysis, hyper-parameter test and other models, and SPSS, Python and other tools were used to obtain the classification rules of glass products under different fluxes, sub classification under different chemical compositions, hyper-parameter K value test and rationality analysis. Research can provide theoretical support for the protection and restoration of ancient glass relics.展开更多
A significant portion of Landslide Early Warning Systems (LEWS) relies on the definition of operational thresholds and the monitoring of cumulative rainfall for alert issuance. These thresholds can be obtained in vari...A significant portion of Landslide Early Warning Systems (LEWS) relies on the definition of operational thresholds and the monitoring of cumulative rainfall for alert issuance. These thresholds can be obtained in various ways, but most often they are based on previous landslide data. This approach introduces several limitations. For instance, there is a requirement for the location to have been previously monitored in some way to have this type of information recorded. Another significant limitation is the need for information regarding the location and timing of incidents. Despite the current ease of obtaining location information (GPS, drone images, etc.), the timing of the event remains challenging to ascertain for a considerable portion of landslide data. Concerning rainfall monitoring, there are multiple ways to consider it, for instance, examining accumulations over various intervals (1 h, 6 h, 24 h, 72 h), as well as in the calculation of effective rainfall, which represents the precipitation that actually infiltrates the soil. However, in the vast majority of cases, both the thresholds and the rain monitoring approach are defined manually and subjectively, relying on the operators’ experience. This makes the process labor-intensive and time-consuming, hindering the establishment of a truly standardized and rapidly scalable methodology on a large scale. In this work, we propose a Landslides Early Warning System (LEWS) based on the concept of rainfall half-life and the determination of thresholds using Cluster Analysis and data inversion. The system is designed to be applied in extensive monitoring networks, such as the one utilized by Cemaden, Brazil’s National Center for Monitoring and Early Warning of Natural Disasters.展开更多
This paper investigates the design essence of Chinese classical private gardens,integrating their design elements and fundamental principles.It systematically analyzes the unique characteristics and differences among ...This paper investigates the design essence of Chinese classical private gardens,integrating their design elements and fundamental principles.It systematically analyzes the unique characteristics and differences among classical private gardens in the Northern,Jiangnan,and Lingnan regions.The study examines nine classical private gardens from Northern China,Jiangnan,and Lingnan by utilizing the advanced tool of principal component cluster analysis.Based on literature analysis and field research,273 variables were selected for principal component analysis,from which four components with higher contribution rates were chosen for further study.Subsequently,we employed clustering analysis techniques to compare the differences among the three types of gardens.The results reveal that the first principal component effectively highlights the differences between Jiangnan and Lingnan private gardens.The second principal component serves as the key to defining the types of Northern private gardens and distinguishing them from the other two types,and the third principal component indicates that Lingnan private gardens can be categorized into two distinct types as well.展开更多
Remarkable progress has been made in infection prevention and control(IPC)in many countries,but some gaps emerged in the context of the coronavirus disease 2019(COVID-19)pandemic.Core capabilities such as standard cli...Remarkable progress has been made in infection prevention and control(IPC)in many countries,but some gaps emerged in the context of the coronavirus disease 2019(COVID-19)pandemic.Core capabilities such as standard clinical precautions and tracing the source of infection were the focus of IPC in medical institutions during the pandemic.Therefore,the core competences of IPC professionals during the pandemic,and how these contributed to successful prevention and control of the epidemic,should be studied.To investigate,using a systematic review and cluster analysis,fundamental improvements in the competences of infection control and prevention professionals that may be emphasized in light of the COVID-19 pandemic.We searched the PubMed,Embase,Cochrane Library,Web of Science,CNKI,WanFang Data,and CBM databases for original articles exploring core competencies of IPC professionals during the COVID-19 pandemic(from January 1,2020 to February 7,2023).Weiciyun software was used for data extraction and the Donohue formula was followed to distinguish high-frequency technical terms.Cluster analysis was performed using the within-group linkage method and squared Euclidean distance as the metric to determine the priority competencies for development.We identified 46 studies with 29 high-frequency technical terms.The most common term was“infection prevention and control training”(184 times,17.3%),followed by“hand hygiene”(172 times,16.2%).“Infection prevention and control in clinical practice”was the most-reported core competency(367 times,34.5%),followed by“microbiology and surveillance”(292 times,27.5%).Cluster analysis showed two key areas of competence:Category 1(program management and leadership,patient safety and occupational health,education and microbiology and surveillance)and Category 2(IPC in clinical practice).During the COVID-19 pandemic,IPC program management and leadership,microbiology and surveillance,education,patient safety,and occupational health were the most important focus of development and should be given due consideration by IPC professionals.展开更多
In the past 30 years, Chinese enterprises have been a hot topic of discussion and concern among the general public in terms of economic and social status, ownership structure, business mechanism, and management level....In the past 30 years, Chinese enterprises have been a hot topic of discussion and concern among the general public in terms of economic and social status, ownership structure, business mechanism, and management level. Solving the problem of employment for the people is an important prerequisite for their peaceful living and work, as well as a prerequisite and foundation for building a harmonious society. The employment situation of private enterprises has always been of great concern to the outside world, and these two major jobs have always occupied an important position in the employment field of China that cannot be ignored. With the establishment of the market economy system, individual and private enterprises have become important components of the socialist economy, making significant contributions to economic development and social progress. The rapid development of China’s economy, on the one hand, is the embodiment of the superiority of China’s socialist market economic system, and on the other hand, it is the role of the tertiary industry and private enterprises in promoting the national economy. Since the 1990s, China’s private enterprises have become a new economic growth point for local and even national countries, and are one of the important ways to arrange employment and achieve social stability. This paper studies the employment of private enterprises and individuals from the perspective of statistics, extracts relevant data from China statistical Yearbook, uses the relevant knowledge of statistics to process the data, obtains the conclusion and puts forward relevant constructive suggestions.展开更多
Aim The several species of the genus Paris called "Chonglou" are famous traditional Chinese herbal medicines. We established the quantitative analysis method of the steroidal saponins in some species of the genus Pa...Aim The several species of the genus Paris called "Chonglou" are famous traditional Chinese herbal medicines. We established the quantitative analysis method of the steroidal saponins in some species of the genus Paris and discussed their relations. Methods We detected the contents of 11 steroidal saponins in Paris samples with a Kromasel C18 ( 150 mm× 4.6 mm ID, 5μm) column which was subjected to gradient elution with acetonitrile-water (30:70- 60:40, V/V) at a flow rate of 1 mL· min^-1 by HPLC-ELSD and established chemical cluster tree using SPSS 11 software. Results All the samples could be separated and calibration curves of 11 saponins were prepared. We successfully detected the contents of 11 steroidal saponins in 14 Paris spp. in 30 min. The recovery for the assay of saponins was between 95 % and 97 %. The RSD of precision of 11 saponins and stability of samples were below 3 %. Chemical phylogenetic tree based on saponin contents indicated that 17 samples of Paris spp. clustered separately. Conclusion The established method is accurate and convenient, and suitable for the quantitative analysis of these 11 steroidal saponins in Paris spp.. The chemical phylogenetic tree is in accordance with Takhtajian classical taxonomy.展开更多
To reveal the genetic diversity of maize germplasm resources in Guizhou Province, we screened 20 pairs of SSR primers out of 100 pairs tested for amplify-ing the al eles in 21 shares of maize germplasm col ected there...To reveal the genetic diversity of maize germplasm resources in Guizhou Province, we screened 20 pairs of SSR primers out of 100 pairs tested for amplify-ing the al eles in 21 shares of maize germplasm col ected there. Total y 85 al eles were identified from the experimental materials, and each primer pair amplified 2-9 al eles, with an average of 4.25. The polymorphism information content (PIC) values varied from 0.19 to 0.74, with an average of 0.50. The clustering analysis by UPG-MA indicated that the genetic distance among the tested lines was 0.22-0.63, with an average of 0.43. This provided the theoretical foundations for establishing the finger print chromatogram of maize, further for exploring and utilizing maize germplasms Guizhou Province.展开更多
[Objective] The aim was to study the variation of leaf characters from different provenance sources of Polygonum multiflorum Thunb,as well as to carry out cluster analysis on P.multiflorum from different provenance so...[Objective] The aim was to study the variation of leaf characters from different provenance sources of Polygonum multiflorum Thunb,as well as to carry out cluster analysis on P.multiflorum from different provenance sources to provide basis for the classification,identification,breeding and improved variety selection of P.multiflorum.[Method] Leaf shape characters of 31 copies of germplasm resources in the major distribution region of the whole country were determined,and the genetic variation of P.multiflorum leaves from different producing areas was analyzed.[Result] The leaf characters of single plant of the same experimental provenance source of P.multiflorum were relatively stable,the variation was mainly found on the single leaf area,1/2 leaf width,leaf width and other indicators;the variation of each leaf character among different provenance sources was obvious,and the variation was mainly found on the single leaf weight,leaf area,1/2 leaf width,leaf length and other indicators.The correlation analysis of each leaf character in P.multiflorum suggested that the single leaf area and single leaf weight showed extremely significant positive correlation with leaf length,1/2 leaf width,leaf width,leaf thickness and leaf stem length,while the single leaf area and single leaf weight showed significant negative correlation with WWR(leaf width/1/2 leaf width)and LWR(leaf length/1/2 leaf length),in addition,several macroscopic leaf characters such as leaf length,1/2 leaf width,leaf width,leaf stem length showed extremely positive correlation.The main component analysis result suggested that the contribution rate of accumulation variance of the front three main components was up to 97.4%,which could better reflect the comprehensive performance of leaf characters of different provenance sources of P.multiflorum.The cluster analysis showed that the experimental 31 copies of P.multiflorum provenance sources should be divided into three classes,the first class was distributed in the Middle,Western of Guizhou,northwestern of Guangxi and western areas with higher altitude;the second class was distributed in Hunan,Hubei,Sichuan,Guangdong and the most area of Guangxi;the third class was distributed in Anhui,Jiangsu and Henan and Shandong.[Conclusion] Cluster analysis of leaf characters indicated that the kinds of provenance sources which the geographical position was closer could be got together.The study had provided a certain basis for the classification of P.multiflorum.展开更多
Because of the difficulty to obtain the traffic flow information of lanes at non-detector intersections in most metropolises of the world,based on the relationships between the lanes of signal-controlled intersections...Because of the difficulty to obtain the traffic flow information of lanes at non-detector intersections in most metropolises of the world,based on the relationships between the lanes of signal-controlled intersections,cluster analysis and stepwise regression are integrated to predict the traffic volume of lanes at non-detector isolated controlled intersections.First cluster analysis is used to cluster the lanes of non-detector isolated signal-controlled intersections and the lanes of all signal-controlled intersections with detectors.Then, by the results of cluster analysis,the traffic volume samples are selected randomly and stepwise regression is used to predict the traffic volume of lanes at non-detector isolated signal-controlled intersections.The method is tested by the traffic volume data of lanes of the road network of Nanjing city.The problem of predicting the traffic volume of lanes at non-detector isolated signal-controlled intersections was resolved and can be widely used in urban traffic flow guidance and urban traffic control in cities without enough intersections equipped with detectors.展开更多
In order to analyze the heterogeneity in vehicular traffic speed, a new method that integrates cluster analysis and probability distribution function fitting is presented. First, for identifying the optimal number of ...In order to analyze the heterogeneity in vehicular traffic speed, a new method that integrates cluster analysis and probability distribution function fitting is presented. First, for identifying the optimal number of clusters, the two-step cluster method is applied to analyze actual speed data, which suggests that dividing speed data into two clusters can best reflect the intrinsic patterns of traffic flows. Such information is then taken as guidance in probability distribution function fitting. The normal, skew-normal and skew-t distribution functions are used to fit the probability distribution of each cluster respectively, which suggests that the skew-t distribution has the highest fitting accuracy; the second is skew-normal distribution; the worst is normal distribution. Model analysis results demonstrate that the proposed mixture model has a better fitting and generalization capability than the conventional single model. In addition, the new method is more flexible in terms of data fitting and can provide a more accurate model of speed distribution.展开更多
[ObJective] The research aimed to determine the geographic distribution map of system of Rana dybowskii. [Method] Four morphologic indices (body length, body weight, forelimb length, hindlimb length) of eight geogra...[ObJective] The research aimed to determine the geographic distribution map of system of Rana dybowskii. [Method] Four morphologic indices (body length, body weight, forelimb length, hindlimb length) of eight geographical populations of R.dybowskii which naturally distribute in Changhai Mountain and Xiaoxing'an Mountain were measured. Measure results were variance analyzed and cluster analyzed. [Result] Variance analysis showed: the genetic branching among the Dongfanghong male population( belongs to Wandashan) and Xiaoxing'an Mountain male population and Changbai Mountain male population were significantly different (P〈0.05) ; the genetic branching between the Hebei female population (belongs to Xiaoxing'an Mountain) and Changbai Mountain female population was significantly different (P〈0.05 ). Cluster analysis showed : male R.dybowskii can be divided into three groups : the first group included Quanyang, Tianbei, Chaoyang and Ddkouqin, the second group included Tieli and Anshan, the third group included Dongfanghong; and the female R. dybowskii can be divided into three groups : the first group included Quanyang and Chaoyang, the second group included Tianbei and Dakouqin, the third group included Hebei. [Condusion] The paper deduced that the Sanjiang Plain was the geographical origin center ofR. dybowskii which radiated to Changbai Mountain and Xiaoxing'an Mountain along the adverse current of Songhua River basin, therefore, the current distribution pattern of R. dybowskii was formed.展开更多
[Objective] The genetic diversity of major mango cultivars in China was analyzed by using SSR markers, and their fingerprints were constructed so as to provide theoretical basis for germplasm innovation and breeding o...[Objective] The genetic diversity of major mango cultivars in China was analyzed by using SSR markers, and their fingerprints were constructed so as to provide theoretical basis for germplasm innovation and breeding of mango. [Method] With 115 pairs of SSR primers, genetic diversity analysis and cluster analysis were performed for 30 mango cultivars, among which the genetic relationships were analyzed. [Result] Total 64 pairs of polymorphic primers were screened out from the 115 pairs of primers, and total 343 bands were amplified from the 30 cultivars with 73.2% of polymorphic bands. On average, 3.9 allelic loci were detected for each pair of primers with genetic diversity index of 0.5, Shannon's diversity index of 1.00 and polymorphism information content of 0.49, indicating higher genetic diversity. The cluster analysis showed that the 30 major cultivars could be classified into four categories. The first category included 14 cultivars; the second category included 11 cultivars, most of which were introduced from abroad; the third category included 4 cultivars, Le., Miansan, Parayinda, Baiyu and Hongxiangya: the fourth category included only one cultivar Maqiesu.By using 7 pairs of SSR markers, i.e., M42, M49, M54, M55, M96, M99 and M103, digital fingerprints were constructed for the 30 mango cultivars. [Conclusion] The 30 mango cultivars present more complex genomic genetics and abundant genetic information, and they have higher genetic diversity.展开更多
[Objective] This study aimed to conduct correspondence cluster analysis of the trace elements in Chinese wolfberry from Qinghai and Ningxia regions, and to investigate the relationship among the quality of the wolfber...[Objective] This study aimed to conduct correspondence cluster analysis of the trace elements in Chinese wolfberry from Qinghai and Ningxia regions, and to investigate the relationship among the quality of the wolfberry samples, the composition of trace elements and the sample sources. [Method] The determined contents of trace elements and ratios of zinc to copper (Zn/Cu) of wolfberry from 11 different producing areas of Qinghai and Ningxia regions were adopted to construct the raw measurement data matrix, to analyze the distribution characteristics of the trace ele- ments in wolfberry from Qinghai and Ningxia by using the corresponding cluster analysis method. [Result] The quality of wolfberry samples in 7hongning County, Zhongwei City, Pingluo County, Shizuishan City, Heicheng Town of Ningxia Hui Au-tonomous Region and Hehuang Valley, Golmud City of Qinghai Province is mainly related to the contents of Zn and Mn; Zn/Cu greatly affects the quality of Chinese wolfberry in Dulan County of Qinghai Province; Fe has great effect on the quality of Chinese wolfberry in Yinchuan City of Ningxia Hui Autonomous Region; Cu greatly affects the quality of Chinese wolfberry in Nuomuhong Village of Qinghai Province and a wolfberry research institute in Ningxia. [Conclusion] The relationship between the quality of wolfberry from different producing areas and the trace elements was investigated, which provides theoretical and practical basis for the cultivation, har- vesting, processing, and further development and utilization of Chinese wolfberry resources from different producing areas.展开更多
In order to scientifically evaluate the values of Cucurbita moschata cultivars, main botanical characters including the initial flowering date, the first fruiting node, fruit length, fruit stem length, stem diameter, ...In order to scientifically evaluate the values of Cucurbita moschata cultivars, main botanical characters including the initial flowering date, the first fruiting node, fruit length, fruit stem length, stem diameter, internode length, the transverse and longitudinal diameters of the largest leaf, single fruit weight, flesh thickness and soluble solid content of 41 cultivars were measured for conducting diversity, correlation and cluster analysis. The results revealed that the pumpkin cultivars showed large variations in fruit stem length, single fruit weight, fruit length and flesh thickness, but small variations in initial flowering date. Significant, even highly significant correlations were found among the tested traits. Cluster analysis demonstrated that the 41 old Cucurbita moschata cultivars were divided into three groups, of which multiple traits of Group 1 were better than those in the other two groups. High similarities existed in three groups and the cultivars in each group. This research provided basis for selecting excellent traits and parents for the breeding of hybrids.展开更多
Inter-simple sequence repeat(ISSR) molecular markers were applied to analyze the genetic diversity and clustering of 48 introduced and bred cultivars of Olea euyopaea L. Totally 106 DNA bands were amplified by 11 sc...Inter-simple sequence repeat(ISSR) molecular markers were applied to analyze the genetic diversity and clustering of 48 introduced and bred cultivars of Olea euyopaea L. Totally 106 DNA bands were amplified by 11 screened primers, including 99 polymorphic bands; the percentage of polymorphic loci was 93.40%, indicating a rich genetic diversity in Olea euyopaea L. germplasm resources. Based on Nei's genetic distances between various cultivars, a dendrogram of 48 cultivars of Olea euyopaea L. was constructed using unweighted pair-group(UPMGA)method,which showed that 48 cultivars were clustered into four main categories; 84.6% of native cultivars were clustered into two categories; most of introduced cultivars were clustered based on their sources and main usages but not on their geographic origins. This study will provide references for the utilization and further genetic improvement of Olea euyopaea L. germplasm resources.展开更多
In order to mine production and security information from security supervising data and to ensure security and safety involved in production and decision-making,a clustering analysis algorithm for security supervising...In order to mine production and security information from security supervising data and to ensure security and safety involved in production and decision-making,a clustering analysis algorithm for security supervising data based on a semantic description in coal mines is studied.First,the semantic and numerical-based hybrid description method of security supervising data in coal mines is described.Secondly,the similarity measurement method of semantic and numerical data are separately given and a weight-based hybrid similarity measurement method for the security supervising data based on a semantic description in coal mines is presented.Thirdly,taking the hybrid similarity measurement method as the distance criteria and using a grid methodology for reference,an improved CURE clustering algorithm based on the grid is presented.Finally,the simulation results of a security supervising data set in coal mines validate the efficiency of the algorithm.展开更多
基金This work was supported by Science and Technology Research Program of Chongqing Municipal Education Commission(KJZD-M202300502,KJQN201800539).
文摘In clustering algorithms,the selection of neighbors significantly affects the quality of the final clustering results.While various neighbor relationships exist,such as K-nearest neighbors,natural neighbors,and shared neighbors,most neighbor relationships can only handle single structural relationships,and the identification accuracy is low for datasets with multiple structures.In life,people’s first instinct for complex things is to divide them into multiple parts to complete.Partitioning the dataset into more sub-graphs is a good idea approach to identifying complex structures.Taking inspiration from this,we propose a novel neighbor method:Shared Natural Neighbors(SNaN).To demonstrate the superiority of this neighbor method,we propose a shared natural neighbors-based hierarchical clustering algorithm for discovering arbitrary-shaped clusters(HC-SNaN).Our algorithm excels in identifying both spherical clusters and manifold clusters.Tested on synthetic datasets and real-world datasets,HC-SNaN demonstrates significant advantages over existing clustering algorithms,particularly when dealing with datasets containing arbitrary shapes.
文摘Efficient iterative unsupervised machine learning involving probabilistic clustering analysis with the expectation-maximization(EM)clustering algorithm is applied to categorize reservoir facies by exploiting latent and observable well-log variables from a clastic reservoir in the Majnoon oilfield,southern Iraq.The observable well-log variables consist of conventional open-hole,well-log data and the computer-processed interpretation of gamma rays,bulk density,neutron porosity,compressional sonic,deep resistivity,shale volume,total porosity,and water saturation,from three wells located in the Nahr Umr reservoir.The latent variables include shale volume and water saturation.The EM algorithm efficiently characterizes electrofacies through iterative machine learning to identify the local maximum likelihood estimates(MLE)of the observable and latent variables in the studied dataset.The optimized EM model developed successfully predicts the core-derived facies classification in two of the studied wells.The EM model clusters the data into three distinctive reservoir electrofacies(F1,F2,and F3).F1 represents a gas-bearing electrofacies with low shale volume(Vsh)and water saturation(Sw)and high porosity and permeability values identifying it as an attractive reservoir target.The results of the EM model are validated using nuclear magnetic resonance(NMR)data from the third studied well for which no cores were recovered.The NMR results confirm the effectiveness and accuracy of the EM model in predicting electrofacies.The utilization of the EM algorithm for electrofacies classification/cluster analysis is innovative.Specifically,the clusters it establishes are less rigidly constrained than those derived from the more commonly used K-means clustering method.The EM methodology developed generates dependable electrofacies estimates in the studied reservoir intervals where core samples are not available.Therefore,once calibrated with core data in some wells,the model is suitable for application to other wells that lack core data.
文摘In this paper, CiteSpace, a bibliometrics software, was adopted to collect research papers published on the Web of Science, which are relevant to biological model and effluent quality prediction in activated sludge process in the wastewater treatment. By the way of trend map, keyword knowledge map, and co-cited knowledge map, specific visualization analysis and identification of the authors, institutions and regions were concluded. Furthermore, the topics and hotspots of water quality prediction in activated sludge process through the literature-co-citation-based cluster analysis and literature citation burst analysis were also determined, which not only reflected the historical evolution progress to a certain extent, but also provided the direction and insight of the knowledge structure of water quality prediction and activated sludge process for future research.
文摘The recent pandemic crisis has highlighted the importance of the availability and management of health data to respond quickly and effectively to health emergencies, while respecting the fundamental rights of every individual. In this context, it is essential to find a balance between the protection of privacy and the safeguarding of public health, using tools that guarantee transparency and consent to the processing of data by the population. This work, starting from a pilot investigation conducted in the Polyclinic of Bari as part of the Horizon Europe Seeds project entitled “Multidisciplinary analysis of technological tracing models of contagion: the protection of rights in the management of health data”, has the objective of promoting greater patient awareness regarding the processing of their health data and the protection of privacy. The methodology used the PHICAT (Personal Health Information Competence Assessment Tool) as a tool and, through the administration of a questionnaire, the aim was to evaluate the patients’ ability to express their consent to the release and processing of health data. The results that emerged were analyzed in relation to the 4 domains in which the process is divided which allows evaluating the patients’ ability to express a conscious choice and, also, in relation to the socio-demographic and clinical characteristics of the patients themselves. This study can contribute to understanding patients’ ability to give their consent and improve information regarding the management of health data by increasing confidence in granting the use of their data for research and clinical management.
文摘In view of the composition analysis and identification of ancient glass products, L1 regularization, K-Means cluster analysis, elbow rule and other methods were comprehensively used to build logical regression, cluster analysis, hyper-parameter test and other models, and SPSS, Python and other tools were used to obtain the classification rules of glass products under different fluxes, sub classification under different chemical compositions, hyper-parameter K value test and rationality analysis. Research can provide theoretical support for the protection and restoration of ancient glass relics.
文摘A significant portion of Landslide Early Warning Systems (LEWS) relies on the definition of operational thresholds and the monitoring of cumulative rainfall for alert issuance. These thresholds can be obtained in various ways, but most often they are based on previous landslide data. This approach introduces several limitations. For instance, there is a requirement for the location to have been previously monitored in some way to have this type of information recorded. Another significant limitation is the need for information regarding the location and timing of incidents. Despite the current ease of obtaining location information (GPS, drone images, etc.), the timing of the event remains challenging to ascertain for a considerable portion of landslide data. Concerning rainfall monitoring, there are multiple ways to consider it, for instance, examining accumulations over various intervals (1 h, 6 h, 24 h, 72 h), as well as in the calculation of effective rainfall, which represents the precipitation that actually infiltrates the soil. However, in the vast majority of cases, both the thresholds and the rain monitoring approach are defined manually and subjectively, relying on the operators’ experience. This makes the process labor-intensive and time-consuming, hindering the establishment of a truly standardized and rapidly scalable methodology on a large scale. In this work, we propose a Landslides Early Warning System (LEWS) based on the concept of rainfall half-life and the determination of thresholds using Cluster Analysis and data inversion. The system is designed to be applied in extensive monitoring networks, such as the one utilized by Cemaden, Brazil’s National Center for Monitoring and Early Warning of Natural Disasters.
文摘This paper investigates the design essence of Chinese classical private gardens,integrating their design elements and fundamental principles.It systematically analyzes the unique characteristics and differences among classical private gardens in the Northern,Jiangnan,and Lingnan regions.The study examines nine classical private gardens from Northern China,Jiangnan,and Lingnan by utilizing the advanced tool of principal component cluster analysis.Based on literature analysis and field research,273 variables were selected for principal component analysis,from which four components with higher contribution rates were chosen for further study.Subsequently,we employed clustering analysis techniques to compare the differences among the three types of gardens.The results reveal that the first principal component effectively highlights the differences between Jiangnan and Lingnan private gardens.The second principal component serves as the key to defining the types of Northern private gardens and distinguishing them from the other two types,and the third principal component indicates that Lingnan private gardens can be categorized into two distinct types as well.
基金The National Natural Science Foundation of China,Grant/Award Number:52178080Major Research Project of the Hospital Management Research Institute of the National Health Commission,Grant/Award Number:GY2023011National Institute of Hospital Administration Management of China,Grant/Award Number:GY2023049。
文摘Remarkable progress has been made in infection prevention and control(IPC)in many countries,but some gaps emerged in the context of the coronavirus disease 2019(COVID-19)pandemic.Core capabilities such as standard clinical precautions and tracing the source of infection were the focus of IPC in medical institutions during the pandemic.Therefore,the core competences of IPC professionals during the pandemic,and how these contributed to successful prevention and control of the epidemic,should be studied.To investigate,using a systematic review and cluster analysis,fundamental improvements in the competences of infection control and prevention professionals that may be emphasized in light of the COVID-19 pandemic.We searched the PubMed,Embase,Cochrane Library,Web of Science,CNKI,WanFang Data,and CBM databases for original articles exploring core competencies of IPC professionals during the COVID-19 pandemic(from January 1,2020 to February 7,2023).Weiciyun software was used for data extraction and the Donohue formula was followed to distinguish high-frequency technical terms.Cluster analysis was performed using the within-group linkage method and squared Euclidean distance as the metric to determine the priority competencies for development.We identified 46 studies with 29 high-frequency technical terms.The most common term was“infection prevention and control training”(184 times,17.3%),followed by“hand hygiene”(172 times,16.2%).“Infection prevention and control in clinical practice”was the most-reported core competency(367 times,34.5%),followed by“microbiology and surveillance”(292 times,27.5%).Cluster analysis showed two key areas of competence:Category 1(program management and leadership,patient safety and occupational health,education and microbiology and surveillance)and Category 2(IPC in clinical practice).During the COVID-19 pandemic,IPC program management and leadership,microbiology and surveillance,education,patient safety,and occupational health were the most important focus of development and should be given due consideration by IPC professionals.
文摘In the past 30 years, Chinese enterprises have been a hot topic of discussion and concern among the general public in terms of economic and social status, ownership structure, business mechanism, and management level. Solving the problem of employment for the people is an important prerequisite for their peaceful living and work, as well as a prerequisite and foundation for building a harmonious society. The employment situation of private enterprises has always been of great concern to the outside world, and these two major jobs have always occupied an important position in the employment field of China that cannot be ignored. With the establishment of the market economy system, individual and private enterprises have become important components of the socialist economy, making significant contributions to economic development and social progress. The rapid development of China’s economy, on the one hand, is the embodiment of the superiority of China’s socialist market economic system, and on the other hand, it is the role of the tertiary industry and private enterprises in promoting the national economy. Since the 1990s, China’s private enterprises have become a new economic growth point for local and even national countries, and are one of the important ways to arrange employment and achieve social stability. This paper studies the employment of private enterprises and individuals from the perspective of statistics, extracts relevant data from China statistical Yearbook, uses the relevant knowledge of statistics to process the data, obtains the conclusion and puts forward relevant constructive suggestions.
文摘Aim The several species of the genus Paris called "Chonglou" are famous traditional Chinese herbal medicines. We established the quantitative analysis method of the steroidal saponins in some species of the genus Paris and discussed their relations. Methods We detected the contents of 11 steroidal saponins in Paris samples with a Kromasel C18 ( 150 mm× 4.6 mm ID, 5μm) column which was subjected to gradient elution with acetonitrile-water (30:70- 60:40, V/V) at a flow rate of 1 mL· min^-1 by HPLC-ELSD and established chemical cluster tree using SPSS 11 software. Results All the samples could be separated and calibration curves of 11 saponins were prepared. We successfully detected the contents of 11 steroidal saponins in 14 Paris spp. in 30 min. The recovery for the assay of saponins was between 95 % and 97 %. The RSD of precision of 11 saponins and stability of samples were below 3 %. Chemical phylogenetic tree based on saponin contents indicated that 17 samples of Paris spp. clustered separately. Conclusion The established method is accurate and convenient, and suitable for the quantitative analysis of these 11 steroidal saponins in Paris spp.. The chemical phylogenetic tree is in accordance with Takhtajian classical taxonomy.
基金Supported by Special Fund of Guizhou Academy of Agricultural Science[2X(2007)004]~~
文摘To reveal the genetic diversity of maize germplasm resources in Guizhou Province, we screened 20 pairs of SSR primers out of 100 pairs tested for amplify-ing the al eles in 21 shares of maize germplasm col ected there. Total y 85 al eles were identified from the experimental materials, and each primer pair amplified 2-9 al eles, with an average of 4.25. The polymorphism information content (PIC) values varied from 0.19 to 0.74, with an average of 0.50. The clustering analysis by UPG-MA indicated that the genetic distance among the tested lines was 0.22-0.63, with an average of 0.43. This provided the theoretical foundations for establishing the finger print chromatogram of maize, further for exploring and utilizing maize germplasms Guizhou Province.
基金Supported by High-tech Research Project of Jiangsu Province(BG2004314)~~
文摘[Objective] The aim was to study the variation of leaf characters from different provenance sources of Polygonum multiflorum Thunb,as well as to carry out cluster analysis on P.multiflorum from different provenance sources to provide basis for the classification,identification,breeding and improved variety selection of P.multiflorum.[Method] Leaf shape characters of 31 copies of germplasm resources in the major distribution region of the whole country were determined,and the genetic variation of P.multiflorum leaves from different producing areas was analyzed.[Result] The leaf characters of single plant of the same experimental provenance source of P.multiflorum were relatively stable,the variation was mainly found on the single leaf area,1/2 leaf width,leaf width and other indicators;the variation of each leaf character among different provenance sources was obvious,and the variation was mainly found on the single leaf weight,leaf area,1/2 leaf width,leaf length and other indicators.The correlation analysis of each leaf character in P.multiflorum suggested that the single leaf area and single leaf weight showed extremely significant positive correlation with leaf length,1/2 leaf width,leaf width,leaf thickness and leaf stem length,while the single leaf area and single leaf weight showed significant negative correlation with WWR(leaf width/1/2 leaf width)and LWR(leaf length/1/2 leaf length),in addition,several macroscopic leaf characters such as leaf length,1/2 leaf width,leaf width,leaf stem length showed extremely positive correlation.The main component analysis result suggested that the contribution rate of accumulation variance of the front three main components was up to 97.4%,which could better reflect the comprehensive performance of leaf characters of different provenance sources of P.multiflorum.The cluster analysis showed that the experimental 31 copies of P.multiflorum provenance sources should be divided into three classes,the first class was distributed in the Middle,Western of Guizhou,northwestern of Guangxi and western areas with higher altitude;the second class was distributed in Hunan,Hubei,Sichuan,Guangdong and the most area of Guangxi;the third class was distributed in Anhui,Jiangsu and Henan and Shandong.[Conclusion] Cluster analysis of leaf characters indicated that the kinds of provenance sources which the geographical position was closer could be got together.The study had provided a certain basis for the classification of P.multiflorum.
基金The National Natural Science Foundation of China(No.50378016).
文摘Because of the difficulty to obtain the traffic flow information of lanes at non-detector intersections in most metropolises of the world,based on the relationships between the lanes of signal-controlled intersections,cluster analysis and stepwise regression are integrated to predict the traffic volume of lanes at non-detector isolated controlled intersections.First cluster analysis is used to cluster the lanes of non-detector isolated signal-controlled intersections and the lanes of all signal-controlled intersections with detectors.Then, by the results of cluster analysis,the traffic volume samples are selected randomly and stepwise regression is used to predict the traffic volume of lanes at non-detector isolated signal-controlled intersections.The method is tested by the traffic volume data of lanes of the road network of Nanjing city.The problem of predicting the traffic volume of lanes at non-detector isolated signal-controlled intersections was resolved and can be widely used in urban traffic flow guidance and urban traffic control in cities without enough intersections equipped with detectors.
基金The National Science Foundation by Changjiang Scholarship of Ministry of Education of China(No.BCS-0527508)the Joint Research Fund for Overseas Natural Science of China(No.51250110075)+1 种基金the Natural Science Foundation of Jiangsu Province(No.BK200910046)the Postdoctoral Science Foundation of Jiangsu Province(No.0901005C)
文摘In order to analyze the heterogeneity in vehicular traffic speed, a new method that integrates cluster analysis and probability distribution function fitting is presented. First, for identifying the optimal number of clusters, the two-step cluster method is applied to analyze actual speed data, which suggests that dividing speed data into two clusters can best reflect the intrinsic patterns of traffic flows. Such information is then taken as guidance in probability distribution function fitting. The normal, skew-normal and skew-t distribution functions are used to fit the probability distribution of each cluster respectively, which suggests that the skew-t distribution has the highest fitting accuracy; the second is skew-normal distribution; the worst is normal distribution. Model analysis results demonstrate that the proposed mixture model has a better fitting and generalization capability than the conventional single model. In addition, the new method is more flexible in terms of data fitting and can provide a more accurate model of speed distribution.
文摘[ObJective] The research aimed to determine the geographic distribution map of system of Rana dybowskii. [Method] Four morphologic indices (body length, body weight, forelimb length, hindlimb length) of eight geographical populations of R.dybowskii which naturally distribute in Changhai Mountain and Xiaoxing'an Mountain were measured. Measure results were variance analyzed and cluster analyzed. [Result] Variance analysis showed: the genetic branching among the Dongfanghong male population( belongs to Wandashan) and Xiaoxing'an Mountain male population and Changbai Mountain male population were significantly different (P〈0.05) ; the genetic branching between the Hebei female population (belongs to Xiaoxing'an Mountain) and Changbai Mountain female population was significantly different (P〈0.05 ). Cluster analysis showed : male R.dybowskii can be divided into three groups : the first group included Quanyang, Tianbei, Chaoyang and Ddkouqin, the second group included Tieli and Anshan, the third group included Dongfanghong; and the female R. dybowskii can be divided into three groups : the first group included Quanyang and Chaoyang, the second group included Tianbei and Dakouqin, the third group included Hebei. [Condusion] The paper deduced that the Sanjiang Plain was the geographical origin center ofR. dybowskii which radiated to Changbai Mountain and Xiaoxing'an Mountain along the adverse current of Songhua River basin, therefore, the current distribution pattern of R. dybowskii was formed.
基金Supported by Natural Science Foundation of Hainan Province(34128)Fundamental Scientific Research Funds of Chinese Academy of Tropical Agricultural Sciences(1630032013031)~~
文摘[Objective] The genetic diversity of major mango cultivars in China was analyzed by using SSR markers, and their fingerprints were constructed so as to provide theoretical basis for germplasm innovation and breeding of mango. [Method] With 115 pairs of SSR primers, genetic diversity analysis and cluster analysis were performed for 30 mango cultivars, among which the genetic relationships were analyzed. [Result] Total 64 pairs of polymorphic primers were screened out from the 115 pairs of primers, and total 343 bands were amplified from the 30 cultivars with 73.2% of polymorphic bands. On average, 3.9 allelic loci were detected for each pair of primers with genetic diversity index of 0.5, Shannon's diversity index of 1.00 and polymorphism information content of 0.49, indicating higher genetic diversity. The cluster analysis showed that the 30 major cultivars could be classified into four categories. The first category included 14 cultivars; the second category included 11 cultivars, most of which were introduced from abroad; the third category included 4 cultivars, Le., Miansan, Parayinda, Baiyu and Hongxiangya: the fourth category included only one cultivar Maqiesu.By using 7 pairs of SSR markers, i.e., M42, M49, M54, M55, M96, M99 and M103, digital fingerprints were constructed for the 30 mango cultivars. [Conclusion] The 30 mango cultivars present more complex genomic genetics and abundant genetic information, and they have higher genetic diversity.
文摘[Objective] This study aimed to conduct correspondence cluster analysis of the trace elements in Chinese wolfberry from Qinghai and Ningxia regions, and to investigate the relationship among the quality of the wolfberry samples, the composition of trace elements and the sample sources. [Method] The determined contents of trace elements and ratios of zinc to copper (Zn/Cu) of wolfberry from 11 different producing areas of Qinghai and Ningxia regions were adopted to construct the raw measurement data matrix, to analyze the distribution characteristics of the trace ele- ments in wolfberry from Qinghai and Ningxia by using the corresponding cluster analysis method. [Result] The quality of wolfberry samples in 7hongning County, Zhongwei City, Pingluo County, Shizuishan City, Heicheng Town of Ningxia Hui Au-tonomous Region and Hehuang Valley, Golmud City of Qinghai Province is mainly related to the contents of Zn and Mn; Zn/Cu greatly affects the quality of Chinese wolfberry in Dulan County of Qinghai Province; Fe has great effect on the quality of Chinese wolfberry in Yinchuan City of Ningxia Hui Autonomous Region; Cu greatly affects the quality of Chinese wolfberry in Nuomuhong Village of Qinghai Province and a wolfberry research institute in Ningxia. [Conclusion] The relationship between the quality of wolfberry from different producing areas and the trace elements was investigated, which provides theoretical and practical basis for the cultivation, har- vesting, processing, and further development and utilization of Chinese wolfberry resources from different producing areas.
基金Supported by Special Fund for Agro-scientific Research in the Public Interest from the Ministry of Agriculture of China(201303112)the 12th National Five-year Plan for Science and Technology Program of Rural Areas(2012BAD02B03-17)~~
文摘In order to scientifically evaluate the values of Cucurbita moschata cultivars, main botanical characters including the initial flowering date, the first fruiting node, fruit length, fruit stem length, stem diameter, internode length, the transverse and longitudinal diameters of the largest leaf, single fruit weight, flesh thickness and soluble solid content of 41 cultivars were measured for conducting diversity, correlation and cluster analysis. The results revealed that the pumpkin cultivars showed large variations in fruit stem length, single fruit weight, fruit length and flesh thickness, but small variations in initial flowering date. Significant, even highly significant correlations were found among the tested traits. Cluster analysis demonstrated that the 41 old Cucurbita moschata cultivars were divided into three groups, of which multiple traits of Group 1 were better than those in the other two groups. High similarities existed in three groups and the cultivars in each group. This research provided basis for selecting excellent traits and parents for the breeding of hybrids.
基金Supported by Key Project of New Product Development in Yunnan Province(2009BB006)~~
文摘Inter-simple sequence repeat(ISSR) molecular markers were applied to analyze the genetic diversity and clustering of 48 introduced and bred cultivars of Olea euyopaea L. Totally 106 DNA bands were amplified by 11 screened primers, including 99 polymorphic bands; the percentage of polymorphic loci was 93.40%, indicating a rich genetic diversity in Olea euyopaea L. germplasm resources. Based on Nei's genetic distances between various cultivars, a dendrogram of 48 cultivars of Olea euyopaea L. was constructed using unweighted pair-group(UPMGA)method,which showed that 48 cultivars were clustered into four main categories; 84.6% of native cultivars were clustered into two categories; most of introduced cultivars were clustered based on their sources and main usages but not on their geographic origins. This study will provide references for the utilization and further genetic improvement of Olea euyopaea L. germplasm resources.
基金The National Natural Science Foundation of China(No.50674086)Specialized Research Fund for the Doctoral Program of Higher Education(No.20060290508)the Postdoctoral Scientific Program of Jiangsu Province(No.0701045B)
文摘In order to mine production and security information from security supervising data and to ensure security and safety involved in production and decision-making,a clustering analysis algorithm for security supervising data based on a semantic description in coal mines is studied.First,the semantic and numerical-based hybrid description method of security supervising data in coal mines is described.Secondly,the similarity measurement method of semantic and numerical data are separately given and a weight-based hybrid similarity measurement method for the security supervising data based on a semantic description in coal mines is presented.Thirdly,taking the hybrid similarity measurement method as the distance criteria and using a grid methodology for reference,an improved CURE clustering algorithm based on the grid is presented.Finally,the simulation results of a security supervising data set in coal mines validate the efficiency of the algorithm.