A structure of logical hierarchical cluster for the distributed multimedia on demand server is proposed. The architecture is mainly composed of the network topology and the resource management of all server nodes. Ins...A structure of logical hierarchical cluster for the distributed multimedia on demand server is proposed. The architecture is mainly composed of the network topology and the resource management of all server nodes. Instead of the physical network hierarchy or the independent management hierarchy, the nodes are organized into a logically hieraxchical cluster according to the multimedia block they caches in the midderware layer. The process of a member joining/leaving or the structure adjustment cooperatively implemented by all members is concerned with decentralized maintenance of the logical cluster hierarchy. As the root of each logically hierarchical cluster is randomly mapped into the system, the logical structure of a multimedia block is dynamically expanded across some regions by the two replication policies in different load state respectively. The local load diversion is applied to fine-tune the load of nodes within a local region but belongs to different logical hierarchies. Guaranteed by the dynamic expansion of a logical structure and the load diversion of a local region, the users always select a closest idle node from the logical hierarchy under the condition of topology integration with resource management.展开更多
In an automatic bobbin management system that simultaneously detects bobbin color and residual yarn,a composite texture segmentation and recognition operation based on an odd partial Gabor filter and multi-color space...In an automatic bobbin management system that simultaneously detects bobbin color and residual yarn,a composite texture segmentation and recognition operation based on an odd partial Gabor filter and multi-color space hierarchical clustering are proposed.Firstly,the parameter-optimized odd partial Gabor filter is used to distinguish bobbin and yarn texture,to explore Garbor parameters for yarn bobbins,and to accurately discriminate frequency characteristics of yarns and texture.Secondly,multi-color clustering segmentation using color spaces such as red,green,blue(RGB)and CIELUV(LUV)solves the problems of over-segmentation and segmentation errors,which are caused by the difficulty of accurately representing the complex and variable color information of yarns in a single-color space and the low contrast between the target and background.Finally,the segmented bobbin is combined with the odd partial Gabor’s edge recognition operator to further distinguish bobbin texture from yarn texture and locate the position and size of the residual yarn.Experimental results show that the method is robust in identifying complex texture,damaged and dyed bobbins,and multi-color yarns.Residual yarn identification can distinguish texture features and residual yarns well and it can be transferred to the detection and differentiation of complex texture,which is significantly better than traditional methods.展开更多
Intuitionistic fuzzy set (IFS) is a set of 2-tuple arguments, each of which is characterized by a membership degree and a nonmembership degree. The generalized form of IFS is interval-valued intuitionistic fuzzy set...Intuitionistic fuzzy set (IFS) is a set of 2-tuple arguments, each of which is characterized by a membership degree and a nonmembership degree. The generalized form of IFS is interval-valued intuitionistic fuzzy set (IVIFS), whose components are intervals rather than exact numbers. IFSs and IVIFSs have been found to be very useful to describe vagueness and uncertainty. However, it seems that little attention has been focused on the clustering analysis of IFSs and IVIFSs. An intuitionistic fuzzy hierarchical algorithm is introduced for clustering IFSs, which is based on the traditional hierarchical clustering procedure, the intuitionistic fuzzy aggregation operator, and the basic distance measures between IFSs: the Hamming distance, normalized Hamming, weighted Hamming, the Euclidean distance, the normalized Euclidean distance, and the weighted Euclidean distance. Subsequently, the algorithm is extended for clustering IVIFSs. Finally the algorithm and its extended form are applied to the classifications of building materials and enterprises respectively.展开更多
The fruits of leguminous plants Cercis Chinensis Bunge are still overlooked although they have been reported to be antioxidative because of the limited information on the phytochemicals of C.chinensis fruits.A simple,...The fruits of leguminous plants Cercis Chinensis Bunge are still overlooked although they have been reported to be antioxidative because of the limited information on the phytochemicals of C.chinensis fruits.A simple,rapid and sensitive HPLC-MS/MS method was developed for the identification and quantitation of the major bioactive components in C.chinensis fruits.Eighteen polyphenols were identified,which are first reported in C.chinensis fruits.Moreover,ten components were simultaneously quantified.The validated quantitative method was proved to be sensitive,reproducible and accurate.Then,it was applied to analyze batches of C.chinensis fruits from different phytomorph and areas.The principal components analysis(PCA)realized visualization and reduction of data set dimension while the hierarchical cluster analysis(HCA)indicated that the content of phenolic acids or all ten components might be used to differentiate C.chinensis fruits of different phytomorph.展开更多
It is a challenging topic to develop an efficient algorithm for large scale classification problems in many applications of machine learning. In this paper, a hierarchical clustering and fixed- layer local learning (...It is a challenging topic to develop an efficient algorithm for large scale classification problems in many applications of machine learning. In this paper, a hierarchical clustering and fixed- layer local learning (HCFLL) based support vector machine(SVM) algorithm is proposed to deal with this problem. Firstly, HCFLL hierarchically dusters a given dataset into a modified clustering feature tree based on the ideas of unsupervised clustering and supervised clustering. Then it locally trains SVM on each labeled subtree at a fixed-layer of the tree. The experimental results show that compared with the existing popular algorithms such as core vector machine and decision.tree support vector machine, HCFLL can significantly improve the training and testing speeds with comparable testing accuracy.展开更多
Graphical representation of hierarchical clustering results is of final importance in hierarchical cluster analysis of data. Unfortunately, almost all mathematical or statistical software may have a weak capability of...Graphical representation of hierarchical clustering results is of final importance in hierarchical cluster analysis of data. Unfortunately, almost all mathematical or statistical software may have a weak capability of showcasing such clustering results. Particularly, most of clustering results or trees drawn cannot be represented in a dendrogram with a resizable, rescalable and free-style fashion. With the “dynamic” drawing instead of “static” one, this research works around these weak functionalities that restrict visualization of clustering results in an arbitrary manner. It introduces an algorithmic solution to these functionalities, which adopts seamless pixel rearrangements to be able to resize and rescale dendrograms or tree diagrams. The results showed that the algorithm developed makes clustering outcome representation a really free visualization of hierarchical clustering and bioinformatics analysis. Especially, it possesses features of selectively visualizing and/or saving results in a specific size, scale and style (different views).展开更多
Purpose: To discuss the problems arising from hierarchical cluster analysis of co-occurrence matrices in SPSS, and the corresponding solutions. Design/methodology/approach: We design different methods of using the S...Purpose: To discuss the problems arising from hierarchical cluster analysis of co-occurrence matrices in SPSS, and the corresponding solutions. Design/methodology/approach: We design different methods of using the SPSS hierarchical clustering module for co-occurrence matrices in order to compare these methods. We offer the correct syntax to deactivate the similarity algorithm for clustering analysis within the hierarchical clustering module of SPSS. Findings: When one inputs co-occurrence matrices into the data editor of the SPSS hierarchical clustering module without deactivating the embedded similarity algorithm, the program calculates similarity twice, and thus distorts and overestimates the degree of similarity. Practical implications: We offer the correct syntax to block the similarity algorithm for clustering analysis in the SPSS hierarchical clustering module in the case of co-occurrence matrices. This syntax enables researchers to avoid obtaining incorrect results. Originality/value: This paper presents a method of editing syntax to prevent the default use of a similarity algorithm for SPSS's hierarchical clustering module. This will help researchers, especially those from China, to properly implement the co-occurrence matrix when using SPSS for hierarchical cluster analysis, in order to provide more scientific and rational results.展开更多
Hierarchical clustering analysis based on statistic s is one of the most important mining algorithms, but the traditionary hierarchica l clustering method is based on global comparing, which only takes in Q clusteri n...Hierarchical clustering analysis based on statistic s is one of the most important mining algorithms, but the traditionary hierarchica l clustering method is based on global comparing, which only takes in Q clusteri ng while ignoring R clustering in practice, so it has some limitation especially when the number of sample and index is very large. Furthermore, because of igno ring the association between the different indexes, the clustering result is not good & true. In this paper, we present the model and the algorithm of two-level hierarchi cal clustering which integrates Q clustering with R clustering. Moreover, becaus e two-level hierarchical clustering is based on the respective clustering resul t of each class, the classification of the indexes directly effects on the a ccuracy of the final clustering result, how to appropriately classify the inde xes is the chief and difficult problem we must handle in advance. Although some literatures also have referred to the issue of the classificati on of the indexes, but the articles classify the indexes only according to their superficial signification, which is unscientific. The reasons are as follow s: First, the superficial signification of some indexes usually takes on different meanings and it is easy to be misapprehended by different person. Furthermore, t his classification method seldom make use of history data, the classification re sult is not so objective. Second, for some indexes, its superficial signification didn’t show any mean ings, so simply from the superficial signification, we can’t classify them to c ertain classes. Third, this classification method need the users have higher level knowledge of this field, otherwise it is difficult for the users to understand the signifi cation of some indexes, which sometimes is not available. So in this paper, to this question, we first use R clustering method to cluste ring indexes, dividing p dimension indexes into q classes, then adopt two-level clustering method to get the final result. Obviously, the classification result is more objective and accurate. Moreover, after the first step, we can get the relation of the different indexes and their interaction. We can also know under a certain class indexes, which samples can be clustering to a class. (These semi finished results sometimes are very useful.) The experiments also indicates the effective and accurate of the algorithms. And, the result of R clustering ca n be easily used for the later practice.展开更多
Network topology inference is one of the important applications of network tomography.Traditional network topology inference may impact network normal operation due to its generation of huge data traffic.A unicast net...Network topology inference is one of the important applications of network tomography.Traditional network topology inference may impact network normal operation due to its generation of huge data traffic.A unicast network topology inference is proposed to use time to live(TTL)for layering and classify nodes layer by layer based on the similarity of node pairs.Finally,the method infers logical network topology effectively with self-adaptive combination of previous results.Simulation results show that the proposed method holds a high accuracy of topology inference while decreasing network measuring flow,thus improves measurement efficiency.展开更多
For the load modeling of a large power grid,the large number of substations covered by it must be segregated into several categories and,thereafter,a load model built for each type.To address the problem of skewed clu...For the load modeling of a large power grid,the large number of substations covered by it must be segregated into several categories and,thereafter,a load model built for each type.To address the problem of skewed clustering tree in the classical hierarchical clustering method used for categorizing substations,a fair hierarchical clustering method is proposed in this paper.First,the fairness index is defined based on the Gini coefficient.Thereafter,a hierarchical clustering method is proposed based on the fairness index.Finally,the clustering results are evaluated using the contour coefficient and the t-SNE two-dimensional plane map.The substations clustering example of a real large power grid considered in this paper illustrates that the proposed fair hierarchical clustering method can effectively address the problem of the skewed clustering tree with high accuracy.展开更多
With the rapid development of big data, the scale of realistic networks is increasing continually. In order to reduce the network scale, some coarse-graining methods are proposed to transform large-scale networks into...With the rapid development of big data, the scale of realistic networks is increasing continually. In order to reduce the network scale, some coarse-graining methods are proposed to transform large-scale networks into mesoscale networks. In this paper, a new coarse-graining method based on hierarchical clustering (HCCG) on complex networks is proposed. The network nodes are grouped by using the hierarchical clustering method, then updating the weights of edges between clusters extract the coarse-grained networks. A large number of simulation experiments on several typical complex networks show that the HCCG method can effectively reduce the network scale, meanwhile maintaining the synchronizability of the original network well. Furthermore, this method is more suitable for these networks with obvious clustering structure, and we can choose freely the size of the coarse-grained networks in the proposed method.展开更多
The 802.15.4 Wireless Sensor Networks (WSN) becomes more economical, feasible and sustainable for new generation communication environment, however their limited resource constraints such as limited power capacity mak...The 802.15.4 Wireless Sensor Networks (WSN) becomes more economical, feasible and sustainable for new generation communication environment, however their limited resource constraints such as limited power capacity make them difficult to detect and defend themselves against variety of attacks. The radio interference attacks that generate for WSN at the Physical Layer cannot be defeated through conventional security mechanisms proposed for 802.15.4 standards. The first section introduces the deployment model of two-tier hierarchical cluster topology architecture and investigates different jamming techniques proposed for WSN by creating specific classification of different types of jamming attacks. The following sections expose the mitigation techniques and possible built-in mechanisms to mitigate the link layer jamming attacks on proposed two-tier hierarchical clustered WSN topology. The two-tier hierarchical cluster based topology is investigated based on contention based protocol suite through OPNET simulation scenarios.展开更多
The rapid growth of modern mobile devices leads to a large number of distributed data,which is extremely valuable for learning models.Unfortunately,model training by collecting all these original data to a centralized...The rapid growth of modern mobile devices leads to a large number of distributed data,which is extremely valuable for learning models.Unfortunately,model training by collecting all these original data to a centralized cloud server is not applicable due to data privacy and communication costs concerns,hindering artificial intelligence from empowering mobile devices.Moreover,these data are not identically and independently distributed(Non-IID)caused by their different context,which will deteriorate the performance of the model.To address these issues,we propose a novel Distributed Learning algorithm based on hierarchical clustering and Adaptive Dataset Condensation,named ADC-DL,which learns a shared model by collecting the synthetic samples generated on each device.To tackle the heterogeneity of data distribution,we propose an entropy topsis comprehensive tiering model for hierarchical clustering,which distinguishes clients in terms of their data characteristics.Subsequently,synthetic dummy samples are generated based on the hierarchical structure utilizing adaptive dataset condensation.The procedure of dataset condensation can be adjusted adaptively according to the tier of the client.Extensive experiments demonstrate that the performance of our ADC-DL is more outstanding in prediction accuracy and communication costs compared with existing algorithms.展开更多
This article is an addendum to the 2001 paper [1] which investigated an approach to hierarchical clustering based on the level sets of a density function induced on data points in a d-dimensional feature space. We ref...This article is an addendum to the 2001 paper [1] which investigated an approach to hierarchical clustering based on the level sets of a density function induced on data points in a d-dimensional feature space. We refer to this as the “level-sets approach” to hierarchical clustering. The density functions considered in [1] were those formed as the sum of identical radial basis functions centered at the data points, each radial basis function assumed to be continuous, monotone decreasing, convex on every ray, and rising to positive infinity at its center point. Such a framework can be investigated with respect to both the Euclidean (L2) and Manhattan (L1) metrics. The addendum here puts forth some observations and questions about the level-sets approach that go beyond those in [1]. In particular, we detail and ask the following questions. How does the level-sets approach compare with other related approaches? How is the resulting hierarchical clustering affected by the choice of radial basis function? What are the structural properties of a function formed as the sum of radial basis functions? Can the levels-sets approach be theoretically validated? Is there an efficient algorithm to implement the level-sets approach?展开更多
Data mining has been a popular research area for more than a decade. There are several problems associated with data mining. Among them clustering is one of the most interesting problems. However, this problem becomes...Data mining has been a popular research area for more than a decade. There are several problems associated with data mining. Among them clustering is one of the most interesting problems. However, this problem becomes more challenging when dataset is distributed between different parties and they do not want to share their data. So, in this paper we propose a privacy preserving two party hierarchical clustering algorithm vertically partitioned data set. Each site only learns the final cluster centers, but nothing about the individual’s data.展开更多
[Objectives]To explore the compatibility rules of neonatal parenteral nutrition(PN)prescriptions based on association rules and hierarchical cluster analysis,thereby providing a reference for standardizing neonatal pa...[Objectives]To explore the compatibility rules of neonatal parenteral nutrition(PN)prescriptions based on association rules and hierarchical cluster analysis,thereby providing a reference for standardizing neonatal parenteral nutrition supportive therapy.[Methods]The data about neonatal PN formulations prepared by the Pharmacy Intravenous Admixture Services(PIVAS)of the Affiliated Hospital of Chengde Medical University from July 2015 to June 2021 were collected.The general information of the prescriptions and the frequency of drug use were analyzed with Excel 2019;the boxplot of drug dosing was drawn using GraphPad 8.0 software;and SPSS Modeler 18.0 and SPSS Statistics 26.0 were used to perform association rules and hierarchical cluster analysis.[Results]A total of 11488 PN prescriptions were collected from 1421 newborns,involving 18 kinds of drugs,which were divided into 11 types of nutrients.Association rules analysis yielded 84 nutrient substance combinations.The combination of fat emulsion-water-soluble vitamins-fat-soluble vitamins-glucose-amino acids had the highest confidence(99.95%).The hierarchical cluster analysis divided nutrients into 5 types.[Conclusions]The prescriptions of PN for newborns were composed of five types of nutrients:amino acids,fat emulsion,glucose,water-soluble vitamins,and fat-soluble vitamins.According to the lack of electrolytes and trace elements,appropriate drugs can be chosen to meet nutritional demands.This study provides reference basis for reasonable selection of drugs for neonatal PN prescriptions and further standardization of PN supportive therapy in newborns.展开更多
Many fields,such as neuroscience,are experiencing the vast prolife ration of cellular data,underscoring the need fo r organizing and interpreting large datasets.A popular approach partitions data into manageable subse...Many fields,such as neuroscience,are experiencing the vast prolife ration of cellular data,underscoring the need fo r organizing and interpreting large datasets.A popular approach partitions data into manageable subsets via hierarchical clustering,but objective methods to determine the appropriate classification granularity are missing.We recently introduced a technique to systematically identify when to stop subdividing clusters based on the fundamental principle that cells must differ more between than within clusters.Here we present the corresponding protocol to classify cellular datasets by combining datadriven unsupervised hierarchical clustering with statistical testing.These general-purpose functions are applicable to any cellular dataset that can be organized as two-dimensional matrices of numerical values,including molecula r,physiological,and anatomical datasets.We demonstrate the protocol using cellular data from the Janelia MouseLight project to chara cterize morphological aspects of neurons.展开更多
In clustering algorithms,the selection of neighbors significantly affects the quality of the final clustering results.While various neighbor relationships exist,such as K-nearest neighbors,natural neighbors,and shared...In clustering algorithms,the selection of neighbors significantly affects the quality of the final clustering results.While various neighbor relationships exist,such as K-nearest neighbors,natural neighbors,and shared neighbors,most neighbor relationships can only handle single structural relationships,and the identification accuracy is low for datasets with multiple structures.In life,people’s first instinct for complex things is to divide them into multiple parts to complete.Partitioning the dataset into more sub-graphs is a good idea approach to identifying complex structures.Taking inspiration from this,we propose a novel neighbor method:Shared Natural Neighbors(SNaN).To demonstrate the superiority of this neighbor method,we propose a shared natural neighbors-based hierarchical clustering algorithm for discovering arbitrary-shaped clusters(HC-SNaN).Our algorithm excels in identifying both spherical clusters and manifold clusters.Tested on synthetic datasets and real-world datasets,HC-SNaN demonstrates significant advantages over existing clustering algorithms,particularly when dealing with datasets containing arbitrary shapes.展开更多
文摘A structure of logical hierarchical cluster for the distributed multimedia on demand server is proposed. The architecture is mainly composed of the network topology and the resource management of all server nodes. Instead of the physical network hierarchy or the independent management hierarchy, the nodes are organized into a logically hieraxchical cluster according to the multimedia block they caches in the midderware layer. The process of a member joining/leaving or the structure adjustment cooperatively implemented by all members is concerned with decentralized maintenance of the logical cluster hierarchy. As the root of each logically hierarchical cluster is randomly mapped into the system, the logical structure of a multimedia block is dynamically expanded across some regions by the two replication policies in different load state respectively. The local load diversion is applied to fine-tune the load of nodes within a local region but belongs to different logical hierarchies. Guaranteed by the dynamic expansion of a logical structure and the load diversion of a local region, the users always select a closest idle node from the logical hierarchy under the condition of topology integration with resource management.
基金Key Research and Development Plan of Shaanxi Province,China(No.2023-YBGY-330)。
文摘In an automatic bobbin management system that simultaneously detects bobbin color and residual yarn,a composite texture segmentation and recognition operation based on an odd partial Gabor filter and multi-color space hierarchical clustering are proposed.Firstly,the parameter-optimized odd partial Gabor filter is used to distinguish bobbin and yarn texture,to explore Garbor parameters for yarn bobbins,and to accurately discriminate frequency characteristics of yarns and texture.Secondly,multi-color clustering segmentation using color spaces such as red,green,blue(RGB)and CIELUV(LUV)solves the problems of over-segmentation and segmentation errors,which are caused by the difficulty of accurately representing the complex and variable color information of yarns in a single-color space and the low contrast between the target and background.Finally,the segmented bobbin is combined with the odd partial Gabor’s edge recognition operator to further distinguish bobbin texture from yarn texture and locate the position and size of the residual yarn.Experimental results show that the method is robust in identifying complex texture,damaged and dyed bobbins,and multi-color yarns.Residual yarn identification can distinguish texture features and residual yarns well and it can be transferred to the detection and differentiation of complex texture,which is significantly better than traditional methods.
基金supported by the National Natural Science Foundation of China (70571087)the National Science Fund for Distinguished Young Scholars of China (70625005)
文摘Intuitionistic fuzzy set (IFS) is a set of 2-tuple arguments, each of which is characterized by a membership degree and a nonmembership degree. The generalized form of IFS is interval-valued intuitionistic fuzzy set (IVIFS), whose components are intervals rather than exact numbers. IFSs and IVIFSs have been found to be very useful to describe vagueness and uncertainty. However, it seems that little attention has been focused on the clustering analysis of IFSs and IVIFSs. An intuitionistic fuzzy hierarchical algorithm is introduced for clustering IFSs, which is based on the traditional hierarchical clustering procedure, the intuitionistic fuzzy aggregation operator, and the basic distance measures between IFSs: the Hamming distance, normalized Hamming, weighted Hamming, the Euclidean distance, the normalized Euclidean distance, and the weighted Euclidean distance. Subsequently, the algorithm is extended for clustering IVIFSs. Finally the algorithm and its extended form are applied to the classifications of building materials and enterprises respectively.
基金supported by the National Natural Science Foundation of China(Grant Nos.82073808,81872828,and 81573384)。
文摘The fruits of leguminous plants Cercis Chinensis Bunge are still overlooked although they have been reported to be antioxidative because of the limited information on the phytochemicals of C.chinensis fruits.A simple,rapid and sensitive HPLC-MS/MS method was developed for the identification and quantitation of the major bioactive components in C.chinensis fruits.Eighteen polyphenols were identified,which are first reported in C.chinensis fruits.Moreover,ten components were simultaneously quantified.The validated quantitative method was proved to be sensitive,reproducible and accurate.Then,it was applied to analyze batches of C.chinensis fruits from different phytomorph and areas.The principal components analysis(PCA)realized visualization and reduction of data set dimension while the hierarchical cluster analysis(HCA)indicated that the content of phenolic acids or all ten components might be used to differentiate C.chinensis fruits of different phytomorph.
基金National Natural Science Foundation of China ( No. 61070033 )Fundamental Research Funds for the Central Universities,China( No. 2012ZM0061)
文摘It is a challenging topic to develop an efficient algorithm for large scale classification problems in many applications of machine learning. In this paper, a hierarchical clustering and fixed- layer local learning (HCFLL) based support vector machine(SVM) algorithm is proposed to deal with this problem. Firstly, HCFLL hierarchically dusters a given dataset into a modified clustering feature tree based on the ideas of unsupervised clustering and supervised clustering. Then it locally trains SVM on each labeled subtree at a fixed-layer of the tree. The experimental results show that compared with the existing popular algorithms such as core vector machine and decision.tree support vector machine, HCFLL can significantly improve the training and testing speeds with comparable testing accuracy.
文摘Graphical representation of hierarchical clustering results is of final importance in hierarchical cluster analysis of data. Unfortunately, almost all mathematical or statistical software may have a weak capability of showcasing such clustering results. Particularly, most of clustering results or trees drawn cannot be represented in a dendrogram with a resizable, rescalable and free-style fashion. With the “dynamic” drawing instead of “static” one, this research works around these weak functionalities that restrict visualization of clustering results in an arbitrary manner. It introduces an algorithmic solution to these functionalities, which adopts seamless pixel rearrangements to be able to resize and rescale dendrograms or tree diagrams. The results showed that the algorithm developed makes clustering outcome representation a really free visualization of hierarchical clustering and bioinformatics analysis. Especially, it possesses features of selectively visualizing and/or saving results in a specific size, scale and style (different views).
文摘Purpose: To discuss the problems arising from hierarchical cluster analysis of co-occurrence matrices in SPSS, and the corresponding solutions. Design/methodology/approach: We design different methods of using the SPSS hierarchical clustering module for co-occurrence matrices in order to compare these methods. We offer the correct syntax to deactivate the similarity algorithm for clustering analysis within the hierarchical clustering module of SPSS. Findings: When one inputs co-occurrence matrices into the data editor of the SPSS hierarchical clustering module without deactivating the embedded similarity algorithm, the program calculates similarity twice, and thus distorts and overestimates the degree of similarity. Practical implications: We offer the correct syntax to block the similarity algorithm for clustering analysis in the SPSS hierarchical clustering module in the case of co-occurrence matrices. This syntax enables researchers to avoid obtaining incorrect results. Originality/value: This paper presents a method of editing syntax to prevent the default use of a similarity algorithm for SPSS's hierarchical clustering module. This will help researchers, especially those from China, to properly implement the co-occurrence matrix when using SPSS for hierarchical cluster analysis, in order to provide more scientific and rational results.
文摘Hierarchical clustering analysis based on statistic s is one of the most important mining algorithms, but the traditionary hierarchica l clustering method is based on global comparing, which only takes in Q clusteri ng while ignoring R clustering in practice, so it has some limitation especially when the number of sample and index is very large. Furthermore, because of igno ring the association between the different indexes, the clustering result is not good & true. In this paper, we present the model and the algorithm of two-level hierarchi cal clustering which integrates Q clustering with R clustering. Moreover, becaus e two-level hierarchical clustering is based on the respective clustering resul t of each class, the classification of the indexes directly effects on the a ccuracy of the final clustering result, how to appropriately classify the inde xes is the chief and difficult problem we must handle in advance. Although some literatures also have referred to the issue of the classificati on of the indexes, but the articles classify the indexes only according to their superficial signification, which is unscientific. The reasons are as follow s: First, the superficial signification of some indexes usually takes on different meanings and it is easy to be misapprehended by different person. Furthermore, t his classification method seldom make use of history data, the classification re sult is not so objective. Second, for some indexes, its superficial signification didn’t show any mean ings, so simply from the superficial signification, we can’t classify them to c ertain classes. Third, this classification method need the users have higher level knowledge of this field, otherwise it is difficult for the users to understand the signifi cation of some indexes, which sometimes is not available. So in this paper, to this question, we first use R clustering method to cluste ring indexes, dividing p dimension indexes into q classes, then adopt two-level clustering method to get the final result. Obviously, the classification result is more objective and accurate. Moreover, after the first step, we can get the relation of the different indexes and their interaction. We can also know under a certain class indexes, which samples can be clustering to a class. (These semi finished results sometimes are very useful.) The experiments also indicates the effective and accurate of the algorithms. And, the result of R clustering ca n be easily used for the later practice.
基金supported by the National Natural Science Foundation of China (Nos.61373137,61373017, 61373139)the Major Program of Jiangsu Higher Education Institutions (No.14KJA520002)+1 种基金the Six Industries Talent Peaks Plan of Jiangsu(No.2013-DZXX-014)the Jiangsu Qinglan Project
文摘Network topology inference is one of the important applications of network tomography.Traditional network topology inference may impact network normal operation due to its generation of huge data traffic.A unicast network topology inference is proposed to use time to live(TTL)for layering and classify nodes layer by layer based on the similarity of node pairs.Finally,the method infers logical network topology effectively with self-adaptive combination of previous results.Simulation results show that the proposed method holds a high accuracy of topology inference while decreasing network measuring flow,thus improves measurement efficiency.
基金supported by the Major Science and Technology Project of Yunnan Province entitled“Research and Application of Key Technologies of Power Grid Operation Analysis and Protection Control for Improving Green Power Consumption”(202002AF080001)the China South Power Grid Science and Technology Project entitled“Research on Load Model and Modeling Method of Yunnan Power Grid”(YNKJXM20180017).
文摘For the load modeling of a large power grid,the large number of substations covered by it must be segregated into several categories and,thereafter,a load model built for each type.To address the problem of skewed clustering tree in the classical hierarchical clustering method used for categorizing substations,a fair hierarchical clustering method is proposed in this paper.First,the fairness index is defined based on the Gini coefficient.Thereafter,a hierarchical clustering method is proposed based on the fairness index.Finally,the clustering results are evaluated using the contour coefficient and the t-SNE two-dimensional plane map.The substations clustering example of a real large power grid considered in this paper illustrates that the proposed fair hierarchical clustering method can effectively address the problem of the skewed clustering tree with high accuracy.
文摘With the rapid development of big data, the scale of realistic networks is increasing continually. In order to reduce the network scale, some coarse-graining methods are proposed to transform large-scale networks into mesoscale networks. In this paper, a new coarse-graining method based on hierarchical clustering (HCCG) on complex networks is proposed. The network nodes are grouped by using the hierarchical clustering method, then updating the weights of edges between clusters extract the coarse-grained networks. A large number of simulation experiments on several typical complex networks show that the HCCG method can effectively reduce the network scale, meanwhile maintaining the synchronizability of the original network well. Furthermore, this method is more suitable for these networks with obvious clustering structure, and we can choose freely the size of the coarse-grained networks in the proposed method.
文摘The 802.15.4 Wireless Sensor Networks (WSN) becomes more economical, feasible and sustainable for new generation communication environment, however their limited resource constraints such as limited power capacity make them difficult to detect and defend themselves against variety of attacks. The radio interference attacks that generate for WSN at the Physical Layer cannot be defeated through conventional security mechanisms proposed for 802.15.4 standards. The first section introduces the deployment model of two-tier hierarchical cluster topology architecture and investigates different jamming techniques proposed for WSN by creating specific classification of different types of jamming attacks. The following sections expose the mitigation techniques and possible built-in mechanisms to mitigate the link layer jamming attacks on proposed two-tier hierarchical clustered WSN topology. The two-tier hierarchical cluster based topology is investigated based on contention based protocol suite through OPNET simulation scenarios.
基金the General Program of National Natural Science Foundation of China(62072049).
文摘The rapid growth of modern mobile devices leads to a large number of distributed data,which is extremely valuable for learning models.Unfortunately,model training by collecting all these original data to a centralized cloud server is not applicable due to data privacy and communication costs concerns,hindering artificial intelligence from empowering mobile devices.Moreover,these data are not identically and independently distributed(Non-IID)caused by their different context,which will deteriorate the performance of the model.To address these issues,we propose a novel Distributed Learning algorithm based on hierarchical clustering and Adaptive Dataset Condensation,named ADC-DL,which learns a shared model by collecting the synthetic samples generated on each device.To tackle the heterogeneity of data distribution,we propose an entropy topsis comprehensive tiering model for hierarchical clustering,which distinguishes clients in terms of their data characteristics.Subsequently,synthetic dummy samples are generated based on the hierarchical structure utilizing adaptive dataset condensation.The procedure of dataset condensation can be adjusted adaptively according to the tier of the client.Extensive experiments demonstrate that the performance of our ADC-DL is more outstanding in prediction accuracy and communication costs compared with existing algorithms.
文摘This article is an addendum to the 2001 paper [1] which investigated an approach to hierarchical clustering based on the level sets of a density function induced on data points in a d-dimensional feature space. We refer to this as the “level-sets approach” to hierarchical clustering. The density functions considered in [1] were those formed as the sum of identical radial basis functions centered at the data points, each radial basis function assumed to be continuous, monotone decreasing, convex on every ray, and rising to positive infinity at its center point. Such a framework can be investigated with respect to both the Euclidean (L2) and Manhattan (L1) metrics. The addendum here puts forth some observations and questions about the level-sets approach that go beyond those in [1]. In particular, we detail and ask the following questions. How does the level-sets approach compare with other related approaches? How is the resulting hierarchical clustering affected by the choice of radial basis function? What are the structural properties of a function formed as the sum of radial basis functions? Can the levels-sets approach be theoretically validated? Is there an efficient algorithm to implement the level-sets approach?
文摘Data mining has been a popular research area for more than a decade. There are several problems associated with data mining. Among them clustering is one of the most interesting problems. However, this problem becomes more challenging when dataset is distributed between different parties and they do not want to share their data. So, in this paper we propose a privacy preserving two party hierarchical clustering algorithm vertically partitioned data set. Each site only learns the final cluster centers, but nothing about the individual’s data.
基金Supported by Science and Technology Research and Development Project of Chengde City,Hebei Province(201706A043)Young Scholar Program of Hebei Pharmaceutical Association Hospital Pharmaceutical Research Project(2020—Hbsyxhqn0029).
文摘[Objectives]To explore the compatibility rules of neonatal parenteral nutrition(PN)prescriptions based on association rules and hierarchical cluster analysis,thereby providing a reference for standardizing neonatal parenteral nutrition supportive therapy.[Methods]The data about neonatal PN formulations prepared by the Pharmacy Intravenous Admixture Services(PIVAS)of the Affiliated Hospital of Chengde Medical University from July 2015 to June 2021 were collected.The general information of the prescriptions and the frequency of drug use were analyzed with Excel 2019;the boxplot of drug dosing was drawn using GraphPad 8.0 software;and SPSS Modeler 18.0 and SPSS Statistics 26.0 were used to perform association rules and hierarchical cluster analysis.[Results]A total of 11488 PN prescriptions were collected from 1421 newborns,involving 18 kinds of drugs,which were divided into 11 types of nutrients.Association rules analysis yielded 84 nutrient substance combinations.The combination of fat emulsion-water-soluble vitamins-fat-soluble vitamins-glucose-amino acids had the highest confidence(99.95%).The hierarchical cluster analysis divided nutrients into 5 types.[Conclusions]The prescriptions of PN for newborns were composed of five types of nutrients:amino acids,fat emulsion,glucose,water-soluble vitamins,and fat-soluble vitamins.According to the lack of electrolytes and trace elements,appropriate drugs can be chosen to meet nutritional demands.This study provides reference basis for reasonable selection of drugs for neonatal PN prescriptions and further standardization of PN supportive therapy in newborns.
基金supported in part by NIH grants R01NS39600,U01MH114829RF1MH128693(to GAA)。
文摘Many fields,such as neuroscience,are experiencing the vast prolife ration of cellular data,underscoring the need fo r organizing and interpreting large datasets.A popular approach partitions data into manageable subsets via hierarchical clustering,but objective methods to determine the appropriate classification granularity are missing.We recently introduced a technique to systematically identify when to stop subdividing clusters based on the fundamental principle that cells must differ more between than within clusters.Here we present the corresponding protocol to classify cellular datasets by combining datadriven unsupervised hierarchical clustering with statistical testing.These general-purpose functions are applicable to any cellular dataset that can be organized as two-dimensional matrices of numerical values,including molecula r,physiological,and anatomical datasets.We demonstrate the protocol using cellular data from the Janelia MouseLight project to chara cterize morphological aspects of neurons.
基金This work was supported by Science and Technology Research Program of Chongqing Municipal Education Commission(KJZD-M202300502,KJQN201800539).
文摘In clustering algorithms,the selection of neighbors significantly affects the quality of the final clustering results.While various neighbor relationships exist,such as K-nearest neighbors,natural neighbors,and shared neighbors,most neighbor relationships can only handle single structural relationships,and the identification accuracy is low for datasets with multiple structures.In life,people’s first instinct for complex things is to divide them into multiple parts to complete.Partitioning the dataset into more sub-graphs is a good idea approach to identifying complex structures.Taking inspiration from this,we propose a novel neighbor method:Shared Natural Neighbors(SNaN).To demonstrate the superiority of this neighbor method,we propose a shared natural neighbors-based hierarchical clustering algorithm for discovering arbitrary-shaped clusters(HC-SNaN).Our algorithm excels in identifying both spherical clusters and manifold clusters.Tested on synthetic datasets and real-world datasets,HC-SNaN demonstrates significant advantages over existing clustering algorithms,particularly when dealing with datasets containing arbitrary shapes.