In clustering algorithms,the selection of neighbors significantly affects the quality of the final clustering results.While various neighbor relationships exist,such as K-nearest neighbors,natural neighbors,and shared...In clustering algorithms,the selection of neighbors significantly affects the quality of the final clustering results.While various neighbor relationships exist,such as K-nearest neighbors,natural neighbors,and shared neighbors,most neighbor relationships can only handle single structural relationships,and the identification accuracy is low for datasets with multiple structures.In life,people’s first instinct for complex things is to divide them into multiple parts to complete.Partitioning the dataset into more sub-graphs is a good idea approach to identifying complex structures.Taking inspiration from this,we propose a novel neighbor method:Shared Natural Neighbors(SNaN).To demonstrate the superiority of this neighbor method,we propose a shared natural neighbors-based hierarchical clustering algorithm for discovering arbitrary-shaped clusters(HC-SNaN).Our algorithm excels in identifying both spherical clusters and manifold clusters.Tested on synthetic datasets and real-world datasets,HC-SNaN demonstrates significant advantages over existing clustering algorithms,particularly when dealing with datasets containing arbitrary shapes.展开更多
In an automatic bobbin management system that simultaneously detects bobbin color and residual yarn,a composite texture segmentation and recognition operation based on an odd partial Gabor filter and multi-color space...In an automatic bobbin management system that simultaneously detects bobbin color and residual yarn,a composite texture segmentation and recognition operation based on an odd partial Gabor filter and multi-color space hierarchical clustering are proposed.Firstly,the parameter-optimized odd partial Gabor filter is used to distinguish bobbin and yarn texture,to explore Garbor parameters for yarn bobbins,and to accurately discriminate frequency characteristics of yarns and texture.Secondly,multi-color clustering segmentation using color spaces such as red,green,blue(RGB)and CIELUV(LUV)solves the problems of over-segmentation and segmentation errors,which are caused by the difficulty of accurately representing the complex and variable color information of yarns in a single-color space and the low contrast between the target and background.Finally,the segmented bobbin is combined with the odd partial Gabor’s edge recognition operator to further distinguish bobbin texture from yarn texture and locate the position and size of the residual yarn.Experimental results show that the method is robust in identifying complex texture,damaged and dyed bobbins,and multi-color yarns.Residual yarn identification can distinguish texture features and residual yarns well and it can be transferred to the detection and differentiation of complex texture,which is significantly better than traditional methods.展开更多
Due to the limitation and hesitation in one's knowledge, the membership degree of an element to a given set usually has a few different values, in which the conventional fuzzy sets are invalid. Hesitant fuzzy sets ar...Due to the limitation and hesitation in one's knowledge, the membership degree of an element to a given set usually has a few different values, in which the conventional fuzzy sets are invalid. Hesitant fuzzy sets are a powerful tool to treat this case. The present paper focuses on investigating the clustering technique for hesitant fuzzy sets based on the K-means clustering algorithm which takes the results of hierarchical clustering as the initial clusters. Finally, two examples demonstrate the validity of our algorithm.展开更多
Intuitionistic fuzzy set (IFS) is a set of 2-tuple arguments, each of which is characterized by a membership degree and a nonmembership degree. The generalized form of IFS is interval-valued intuitionistic fuzzy set...Intuitionistic fuzzy set (IFS) is a set of 2-tuple arguments, each of which is characterized by a membership degree and a nonmembership degree. The generalized form of IFS is interval-valued intuitionistic fuzzy set (IVIFS), whose components are intervals rather than exact numbers. IFSs and IVIFSs have been found to be very useful to describe vagueness and uncertainty. However, it seems that little attention has been focused on the clustering analysis of IFSs and IVIFSs. An intuitionistic fuzzy hierarchical algorithm is introduced for clustering IFSs, which is based on the traditional hierarchical clustering procedure, the intuitionistic fuzzy aggregation operator, and the basic distance measures between IFSs: the Hamming distance, normalized Hamming, weighted Hamming, the Euclidean distance, the normalized Euclidean distance, and the weighted Euclidean distance. Subsequently, the algorithm is extended for clustering IVIFSs. Finally the algorithm and its extended form are applied to the classifications of building materials and enterprises respectively.展开更多
Single-pass is commonly used in topic detection and tracking( TDT) due to its simplicity,high efficiency and low cost. When dealing with large-scale data,time cost will increase sharply and clustering performance will...Single-pass is commonly used in topic detection and tracking( TDT) due to its simplicity,high efficiency and low cost. When dealing with large-scale data,time cost will increase sharply and clustering performance will be affected greatly. Aiming at this problem,hierarchical clustering algorithm based on single-pass is proposed,which is inspired by hierarchical and concurrent ideas to divide clustering process into three stages. News reports are classified into different categories firstly.Then there are twice single-pass clustering processes in the same category,and one agglomerative clustering among different categories. In addition,for semantic similarity in news reports,topic model is improved based on named entities. Experimental results show that the proposed method can effectively accelerate the process as well as improve the performance.展开更多
The fruits of leguminous plants Cercis Chinensis Bunge are still overlooked although they have been reported to be antioxidative because of the limited information on the phytochemicals of C.chinensis fruits.A simple,...The fruits of leguminous plants Cercis Chinensis Bunge are still overlooked although they have been reported to be antioxidative because of the limited information on the phytochemicals of C.chinensis fruits.A simple,rapid and sensitive HPLC-MS/MS method was developed for the identification and quantitation of the major bioactive components in C.chinensis fruits.Eighteen polyphenols were identified,which are first reported in C.chinensis fruits.Moreover,ten components were simultaneously quantified.The validated quantitative method was proved to be sensitive,reproducible and accurate.Then,it was applied to analyze batches of C.chinensis fruits from different phytomorph and areas.The principal components analysis(PCA)realized visualization and reduction of data set dimension while the hierarchical cluster analysis(HCA)indicated that the content of phenolic acids or all ten components might be used to differentiate C.chinensis fruits of different phytomorph.展开更多
For the problem of large network load generated by the Gnutella resource-searching model in Peer to Peer (P2P) network, a improved model to decrease the network expense is proposed, which establishes a duster in P2P...For the problem of large network load generated by the Gnutella resource-searching model in Peer to Peer (P2P) network, a improved model to decrease the network expense is proposed, which establishes a duster in P2P network, auto-organizes logical layers, and applies a hybrid mechanism of directional searching and flooding. The performance analysis and simulation results show that the proposed hierarchical searching model has availably reduced the generated message load and that its searching-response time performance is as fairly good as that of the Gnutella model.展开更多
News feed is one of the potential information providing sources which give updates on various topics of different domains.These updates on various topics need to be collected since the domain specific interested users...News feed is one of the potential information providing sources which give updates on various topics of different domains.These updates on various topics need to be collected since the domain specific interested users are in need of important updates in their domains with organized data from various sources.In this paper,the news summarization system is proposed for the news data streams from RSS feeds and Google news.Since news stream analysis requires live content,the news data are continuously collected for our experimentation.Themajor contributions of thiswork involve domain corpus based news collection,news content extraction,hierarchical clustering of the news and summarization of news.Many of the existing news summarization systems lack in providing dynamic content with domain wise representation.This is alleviated in our proposed systemby tagging the news feed with domain corpuses and organizing the news streams with the hierarchical structure with topic wise representation.Further,the news streams are summarized for the users with a novel summarization algorithm.The proposed summarization system generates topic wise summaries effectively for the user and no system in the literature has handled the news summarization by collecting the data dynamically and organizing the content hierarchically.The proposed system is compared with existing systems and achieves better results in generating news summaries.The Online news content editors are highly benefitted by this system for instantly getting the news summaries of their domain interest.展开更多
Based on non-maximally entangled four-particle cluster states, we propose a new hierarchical information splitting protocol to probabilistically realize the quantum state sharing of an arbitrary unknown two-qubit stat...Based on non-maximally entangled four-particle cluster states, we propose a new hierarchical information splitting protocol to probabilistically realize the quantum state sharing of an arbitrary unknown two-qubit state. In this scheme, the sender transmits the two-qubit secret state to three agents who are divided into two grades with two Bell-state measurements,and broadcasts the measurement results via a classical channel. One agent is in the upper grade and two agents are in the lower grade. The agent in the upper grade only needs to cooperate with one of the other two agents to recover the secret state but both of the agents in the lower grade need help from all of the agents. Every agent who wants to recover the secret state needs to introduce two ancillary qubits and performs a positive operator-valued measurement(POVM) instead of the usual projective measurement. Moreover, due to the symmetry of the cluster state, we extend this protocol to multiparty agents.展开更多
It is a challenging topic to develop an efficient algorithm for large scale classification problems in many applications of machine learning. In this paper, a hierarchical clustering and fixed- layer local learning (...It is a challenging topic to develop an efficient algorithm for large scale classification problems in many applications of machine learning. In this paper, a hierarchical clustering and fixed- layer local learning (HCFLL) based support vector machine(SVM) algorithm is proposed to deal with this problem. Firstly, HCFLL hierarchically dusters a given dataset into a modified clustering feature tree based on the ideas of unsupervised clustering and supervised clustering. Then it locally trains SVM on each labeled subtree at a fixed-layer of the tree. The experimental results show that compared with the existing popular algorithms such as core vector machine and decision.tree support vector machine, HCFLL can significantly improve the training and testing speeds with comparable testing accuracy.展开更多
Graphical representation of hierarchical clustering results is of final importance in hierarchical cluster analysis of data. Unfortunately, almost all mathematical or statistical software may have a weak capability of...Graphical representation of hierarchical clustering results is of final importance in hierarchical cluster analysis of data. Unfortunately, almost all mathematical or statistical software may have a weak capability of showcasing such clustering results. Particularly, most of clustering results or trees drawn cannot be represented in a dendrogram with a resizable, rescalable and free-style fashion. With the “dynamic” drawing instead of “static” one, this research works around these weak functionalities that restrict visualization of clustering results in an arbitrary manner. It introduces an algorithmic solution to these functionalities, which adopts seamless pixel rearrangements to be able to resize and rescale dendrograms or tree diagrams. The results showed that the algorithm developed makes clustering outcome representation a really free visualization of hierarchical clustering and bioinformatics analysis. Especially, it possesses features of selectively visualizing and/or saving results in a specific size, scale and style (different views).展开更多
As an important branch of machine learning,clustering analysis is widely used in some fields,e.g.,image pattern recognition,social network analysis,information security,and so on.In this paper,we consider the designin...As an important branch of machine learning,clustering analysis is widely used in some fields,e.g.,image pattern recognition,social network analysis,information security,and so on.In this paper,we consider the designing of clustering algorithm in quantum scenario,and propose a quantum hierarchical agglomerative clustering algorithm,which is based on one dimension discrete quantum walk with single-point phase defects.In the proposed algorithm,two nonclassical characters of this kind of quantum walk,localization and ballistic effects,are exploited.At first,each data point is viewed as a particle and performed this kind of quantum walk with a parameter,which is determined by its neighbors.After that,the particles are measured in a calculation basis.In terms of the measurement result,every attribute value of the corresponding data point is modified appropriately.In this way,each data point interacts with its neighbors and moves toward a certain center point.At last,this process is repeated several times until similar data points cluster together and form distinct classes.Simulation experiments on the synthetic and real world data demonstrate the effectiveness of the presented algorithm.Compared with some classical algorithms,the proposed algorithm achieves better clustering results.Moreover,combining quantum cluster assignment method,the presented algorithm can speed up the calculating velocity.展开更多
The complexity of large-scale network systems made of a large number of nonlinearly interconnected components is a restrictive facet for their modeling and analysis. In this paper, we propose a framework of hierarchic...The complexity of large-scale network systems made of a large number of nonlinearly interconnected components is a restrictive facet for their modeling and analysis. In this paper, we propose a framework of hierarchical modeling of a complex network system, based on a recursive unsupervised spectral clustering method. The hierarchical model serves the purpose of facilitating the management of complexity in the analysis of real-world critical infrastructures. We exemplify this by referring to the reliability analysis of the 380 kV Italian Power Transmission Network (IPTN). In this work of analysis, the classical component Importance Measures (IMs) of reliability theory have been extended to render them compatible and applicable to a complex distributed network system. By utilizing these extended IMs, the reliability properties of the IPTN system can be evaluated in the framework of the hierarchical system model, with the aim of providing risk managers with information on the risk/safety significance of system structures and components.展开更多
Purpose: To discuss the problems arising from hierarchical cluster analysis of co-occurrence matrices in SPSS, and the corresponding solutions. Design/methodology/approach: We design different methods of using the S...Purpose: To discuss the problems arising from hierarchical cluster analysis of co-occurrence matrices in SPSS, and the corresponding solutions. Design/methodology/approach: We design different methods of using the SPSS hierarchical clustering module for co-occurrence matrices in order to compare these methods. We offer the correct syntax to deactivate the similarity algorithm for clustering analysis within the hierarchical clustering module of SPSS. Findings: When one inputs co-occurrence matrices into the data editor of the SPSS hierarchical clustering module without deactivating the embedded similarity algorithm, the program calculates similarity twice, and thus distorts and overestimates the degree of similarity. Practical implications: We offer the correct syntax to block the similarity algorithm for clustering analysis in the SPSS hierarchical clustering module in the case of co-occurrence matrices. This syntax enables researchers to avoid obtaining incorrect results. Originality/value: This paper presents a method of editing syntax to prevent the default use of a similarity algorithm for SPSS's hierarchical clustering module. This will help researchers, especially those from China, to properly implement the co-occurrence matrix when using SPSS for hierarchical cluster analysis, in order to provide more scientific and rational results.展开更多
For the charging station construction of electric vehicle,location selecting is a key issue.There are two problems in location selection of the electric vehicle charging station.One is determining the location of char...For the charging station construction of electric vehicle,location selecting is a key issue.There are two problems in location selection of the electric vehicle charging station.One is determining the location of charging station;the other is evaluating the location of charging station.To determine the charging station location,an spatial clustering algorithm is proposed and programmed.The example simulation shows the effectiveness of the spatial clustering algorithm.To evaluate the charging station location,a multi-hierarchical fuzzy method is proposed.Based on the location factors of electric vehicle charging station,the hierarchical evaluation structure of electric vehicle charging station location is constructed,including three levels,4first-class factors and 14second-class factors.The fuzzy multi-hierarchical evaluation model and algorithm are built.The analysis results show that the multi-hierarchical fuzzy method can reasonably complete the electric vehicle charging station location evaluation.展开更多
Many fields,such as neuroscience,are experiencing the vast prolife ration of cellular data,underscoring the need fo r organizing and interpreting large datasets.A popular approach partitions data into manageable subse...Many fields,such as neuroscience,are experiencing the vast prolife ration of cellular data,underscoring the need fo r organizing and interpreting large datasets.A popular approach partitions data into manageable subsets via hierarchical clustering,but objective methods to determine the appropriate classification granularity are missing.We recently introduced a technique to systematically identify when to stop subdividing clusters based on the fundamental principle that cells must differ more between than within clusters.Here we present the corresponding protocol to classify cellular datasets by combining datadriven unsupervised hierarchical clustering with statistical testing.These general-purpose functions are applicable to any cellular dataset that can be organized as two-dimensional matrices of numerical values,including molecula r,physiological,and anatomical datasets.We demonstrate the protocol using cellular data from the Janelia MouseLight project to chara cterize morphological aspects of neurons.展开更多
Hierarchical clustering analysis based on statistic s is one of the most important mining algorithms, but the traditionary hierarchica l clustering method is based on global comparing, which only takes in Q clusteri n...Hierarchical clustering analysis based on statistic s is one of the most important mining algorithms, but the traditionary hierarchica l clustering method is based on global comparing, which only takes in Q clusteri ng while ignoring R clustering in practice, so it has some limitation especially when the number of sample and index is very large. Furthermore, because of igno ring the association between the different indexes, the clustering result is not good & true. In this paper, we present the model and the algorithm of two-level hierarchi cal clustering which integrates Q clustering with R clustering. Moreover, becaus e two-level hierarchical clustering is based on the respective clustering resul t of each class, the classification of the indexes directly effects on the a ccuracy of the final clustering result, how to appropriately classify the inde xes is the chief and difficult problem we must handle in advance. Although some literatures also have referred to the issue of the classificati on of the indexes, but the articles classify the indexes only according to their superficial signification, which is unscientific. The reasons are as follow s: First, the superficial signification of some indexes usually takes on different meanings and it is easy to be misapprehended by different person. Furthermore, t his classification method seldom make use of history data, the classification re sult is not so objective. Second, for some indexes, its superficial signification didn’t show any mean ings, so simply from the superficial signification, we can’t classify them to c ertain classes. Third, this classification method need the users have higher level knowledge of this field, otherwise it is difficult for the users to understand the signifi cation of some indexes, which sometimes is not available. So in this paper, to this question, we first use R clustering method to cluste ring indexes, dividing p dimension indexes into q classes, then adopt two-level clustering method to get the final result. Obviously, the classification result is more objective and accurate. Moreover, after the first step, we can get the relation of the different indexes and their interaction. We can also know under a certain class indexes, which samples can be clustering to a class. (These semi finished results sometimes are very useful.) The experiments also indicates the effective and accurate of the algorithms. And, the result of R clustering ca n be easily used for the later practice.展开更多
Axiomatization of Shannon entropy is a subject that has received lots of attention in the information theory literature.While Shannon entropy is defined on probability distribution,we define a new type of entropy on t...Axiomatization of Shannon entropy is a subject that has received lots of attention in the information theory literature.While Shannon entropy is defined on probability distribution,we define a new type of entropy on the set of partitions of finite subsets of metric spaces,which has a rich algebraic structure as a partially ordered set.We propose an axiomatization of an entropy-like measure of partitions of sets of objects located in metric spaces,and we derive an analytic expression of this new type of entropy referred to as inertial entropy.This approach starts with the notion of inertia of a partition and includes a study of the behavior of the sum of square errors of a partition.In this context,we characterize the chain of partitions produced by the Ward hierarchical clustering method.Starting from inertial entropies of partitions,we introduce conditional entropies which,in turn,generate metrics on partitions of finite sets.These metrics are used as external validation tools for clusterings of labeled data sets.The metric generated by inertial entropy can be used to validate data clustering for labeled data sets.This type of validation aims to determine to what extend labeling of the data coincides with the clustering obtained algorithmically,and we obtain a high degree of consistency of the data labeling with the results of several hierarchical clusterings.展开更多
We propose two models in this paper. The concept of association model is put forward to obtain the co-occurrence relationships among keywords in the documents and the hierarchical Hamming clustering model is used to r...We propose two models in this paper. The concept of association model is put forward to obtain the co-occurrence relationships among keywords in the documents and the hierarchical Hamming clustering model is used to reduce the dimensionality of the category feature vector space which can solve the problem of the extremely high dimensionality of the documents' feature space. The results of experiment indicate that it can obtain the co-occurrence relations among key-words in the documents which promote the recall of classification system effectively. The hierarchical Hamming clustering model can reduce the dimensionality of the category feature vector efficiently, the size of the vector space is only about 10% of the primary dimensionality. Key words text classification - concept association - hierarchical clustering - hamming clustering CLC number TN 915. 08 Foundation item: Supporteded by the National 863 Project of China (2001AA142160, 2002AA145090)Biography: Su Gui-yang (1974-), male, Ph. D candidate, research direction: information filter and text classification.展开更多
基金This work was supported by Science and Technology Research Program of Chongqing Municipal Education Commission(KJZD-M202300502,KJQN201800539).
文摘In clustering algorithms,the selection of neighbors significantly affects the quality of the final clustering results.While various neighbor relationships exist,such as K-nearest neighbors,natural neighbors,and shared neighbors,most neighbor relationships can only handle single structural relationships,and the identification accuracy is low for datasets with multiple structures.In life,people’s first instinct for complex things is to divide them into multiple parts to complete.Partitioning the dataset into more sub-graphs is a good idea approach to identifying complex structures.Taking inspiration from this,we propose a novel neighbor method:Shared Natural Neighbors(SNaN).To demonstrate the superiority of this neighbor method,we propose a shared natural neighbors-based hierarchical clustering algorithm for discovering arbitrary-shaped clusters(HC-SNaN).Our algorithm excels in identifying both spherical clusters and manifold clusters.Tested on synthetic datasets and real-world datasets,HC-SNaN demonstrates significant advantages over existing clustering algorithms,particularly when dealing with datasets containing arbitrary shapes.
基金Key Research and Development Plan of Shaanxi Province,China(No.2023-YBGY-330)。
文摘In an automatic bobbin management system that simultaneously detects bobbin color and residual yarn,a composite texture segmentation and recognition operation based on an odd partial Gabor filter and multi-color space hierarchical clustering are proposed.Firstly,the parameter-optimized odd partial Gabor filter is used to distinguish bobbin and yarn texture,to explore Garbor parameters for yarn bobbins,and to accurately discriminate frequency characteristics of yarns and texture.Secondly,multi-color clustering segmentation using color spaces such as red,green,blue(RGB)and CIELUV(LUV)solves the problems of over-segmentation and segmentation errors,which are caused by the difficulty of accurately representing the complex and variable color information of yarns in a single-color space and the low contrast between the target and background.Finally,the segmented bobbin is combined with the odd partial Gabor’s edge recognition operator to further distinguish bobbin texture from yarn texture and locate the position and size of the residual yarn.Experimental results show that the method is robust in identifying complex texture,damaged and dyed bobbins,and multi-color yarns.Residual yarn identification can distinguish texture features and residual yarns well and it can be transferred to the detection and differentiation of complex texture,which is significantly better than traditional methods.
基金Supported by the National Natural Science Foundation of China(61273209)
文摘Due to the limitation and hesitation in one's knowledge, the membership degree of an element to a given set usually has a few different values, in which the conventional fuzzy sets are invalid. Hesitant fuzzy sets are a powerful tool to treat this case. The present paper focuses on investigating the clustering technique for hesitant fuzzy sets based on the K-means clustering algorithm which takes the results of hierarchical clustering as the initial clusters. Finally, two examples demonstrate the validity of our algorithm.
基金supported by the National Natural Science Foundation of China (70571087)the National Science Fund for Distinguished Young Scholars of China (70625005)
文摘Intuitionistic fuzzy set (IFS) is a set of 2-tuple arguments, each of which is characterized by a membership degree and a nonmembership degree. The generalized form of IFS is interval-valued intuitionistic fuzzy set (IVIFS), whose components are intervals rather than exact numbers. IFSs and IVIFSs have been found to be very useful to describe vagueness and uncertainty. However, it seems that little attention has been focused on the clustering analysis of IFSs and IVIFSs. An intuitionistic fuzzy hierarchical algorithm is introduced for clustering IFSs, which is based on the traditional hierarchical clustering procedure, the intuitionistic fuzzy aggregation operator, and the basic distance measures between IFSs: the Hamming distance, normalized Hamming, weighted Hamming, the Euclidean distance, the normalized Euclidean distance, and the weighted Euclidean distance. Subsequently, the algorithm is extended for clustering IVIFSs. Finally the algorithm and its extended form are applied to the classifications of building materials and enterprises respectively.
基金Supported by the National Natural Science Foundation of China(No.61502312)the Fundamental Research Funds for the Central Universities(No.2017BQ024)+1 种基金the Natural Science Foundation of Guangdong Province(No.2017A030310428)the Science and Technology Programm of Guangzhou(No.201806020075,20180210025)
文摘Single-pass is commonly used in topic detection and tracking( TDT) due to its simplicity,high efficiency and low cost. When dealing with large-scale data,time cost will increase sharply and clustering performance will be affected greatly. Aiming at this problem,hierarchical clustering algorithm based on single-pass is proposed,which is inspired by hierarchical and concurrent ideas to divide clustering process into three stages. News reports are classified into different categories firstly.Then there are twice single-pass clustering processes in the same category,and one agglomerative clustering among different categories. In addition,for semantic similarity in news reports,topic model is improved based on named entities. Experimental results show that the proposed method can effectively accelerate the process as well as improve the performance.
基金supported by the National Natural Science Foundation of China(Grant Nos.82073808,81872828,and 81573384)。
文摘The fruits of leguminous plants Cercis Chinensis Bunge are still overlooked although they have been reported to be antioxidative because of the limited information on the phytochemicals of C.chinensis fruits.A simple,rapid and sensitive HPLC-MS/MS method was developed for the identification and quantitation of the major bioactive components in C.chinensis fruits.Eighteen polyphenols were identified,which are first reported in C.chinensis fruits.Moreover,ten components were simultaneously quantified.The validated quantitative method was proved to be sensitive,reproducible and accurate.Then,it was applied to analyze batches of C.chinensis fruits from different phytomorph and areas.The principal components analysis(PCA)realized visualization and reduction of data set dimension while the hierarchical cluster analysis(HCA)indicated that the content of phenolic acids or all ten components might be used to differentiate C.chinensis fruits of different phytomorph.
文摘For the problem of large network load generated by the Gnutella resource-searching model in Peer to Peer (P2P) network, a improved model to decrease the network expense is proposed, which establishes a duster in P2P network, auto-organizes logical layers, and applies a hybrid mechanism of directional searching and flooding. The performance analysis and simulation results show that the proposed hierarchical searching model has availably reduced the generated message load and that its searching-response time performance is as fairly good as that of the Gnutella model.
文摘News feed is one of the potential information providing sources which give updates on various topics of different domains.These updates on various topics need to be collected since the domain specific interested users are in need of important updates in their domains with organized data from various sources.In this paper,the news summarization system is proposed for the news data streams from RSS feeds and Google news.Since news stream analysis requires live content,the news data are continuously collected for our experimentation.Themajor contributions of thiswork involve domain corpus based news collection,news content extraction,hierarchical clustering of the news and summarization of news.Many of the existing news summarization systems lack in providing dynamic content with domain wise representation.This is alleviated in our proposed systemby tagging the news feed with domain corpuses and organizing the news streams with the hierarchical structure with topic wise representation.Further,the news streams are summarized for the users with a novel summarization algorithm.The proposed summarization system generates topic wise summaries effectively for the user and no system in the literature has handled the news summarization by collecting the data dynamically and organizing the content hierarchically.The proposed system is compared with existing systems and achieves better results in generating news summaries.The Online news content editors are highly benefitted by this system for instantly getting the news summaries of their domain interest.
基金Project supported by the National Natural Science Foundation of China(Grant No.61671087)
文摘Based on non-maximally entangled four-particle cluster states, we propose a new hierarchical information splitting protocol to probabilistically realize the quantum state sharing of an arbitrary unknown two-qubit state. In this scheme, the sender transmits the two-qubit secret state to three agents who are divided into two grades with two Bell-state measurements,and broadcasts the measurement results via a classical channel. One agent is in the upper grade and two agents are in the lower grade. The agent in the upper grade only needs to cooperate with one of the other two agents to recover the secret state but both of the agents in the lower grade need help from all of the agents. Every agent who wants to recover the secret state needs to introduce two ancillary qubits and performs a positive operator-valued measurement(POVM) instead of the usual projective measurement. Moreover, due to the symmetry of the cluster state, we extend this protocol to multiparty agents.
基金National Natural Science Foundation of China ( No. 61070033 )Fundamental Research Funds for the Central Universities,China( No. 2012ZM0061)
文摘It is a challenging topic to develop an efficient algorithm for large scale classification problems in many applications of machine learning. In this paper, a hierarchical clustering and fixed- layer local learning (HCFLL) based support vector machine(SVM) algorithm is proposed to deal with this problem. Firstly, HCFLL hierarchically dusters a given dataset into a modified clustering feature tree based on the ideas of unsupervised clustering and supervised clustering. Then it locally trains SVM on each labeled subtree at a fixed-layer of the tree. The experimental results show that compared with the existing popular algorithms such as core vector machine and decision.tree support vector machine, HCFLL can significantly improve the training and testing speeds with comparable testing accuracy.
文摘Graphical representation of hierarchical clustering results is of final importance in hierarchical cluster analysis of data. Unfortunately, almost all mathematical or statistical software may have a weak capability of showcasing such clustering results. Particularly, most of clustering results or trees drawn cannot be represented in a dendrogram with a resizable, rescalable and free-style fashion. With the “dynamic” drawing instead of “static” one, this research works around these weak functionalities that restrict visualization of clustering results in an arbitrary manner. It introduces an algorithmic solution to these functionalities, which adopts seamless pixel rearrangements to be able to resize and rescale dendrograms or tree diagrams. The results showed that the algorithm developed makes clustering outcome representation a really free visualization of hierarchical clustering and bioinformatics analysis. Especially, it possesses features of selectively visualizing and/or saving results in a specific size, scale and style (different views).
基金This work was supported by National Natural Science Foundation of China(Grants Nos.61976053 and 61772134)Fujian Province Natural Science Foundation(Grant No.2018J01776)+1 种基金Program for New Century Excellent Talents in Fujian Province University,Probability and Statistics:Theory and Application(Grant No.IRTL1704)the Program for Innovative Research Team in Science and Technology in Fujian Province University.
文摘As an important branch of machine learning,clustering analysis is widely used in some fields,e.g.,image pattern recognition,social network analysis,information security,and so on.In this paper,we consider the designing of clustering algorithm in quantum scenario,and propose a quantum hierarchical agglomerative clustering algorithm,which is based on one dimension discrete quantum walk with single-point phase defects.In the proposed algorithm,two nonclassical characters of this kind of quantum walk,localization and ballistic effects,are exploited.At first,each data point is viewed as a particle and performed this kind of quantum walk with a parameter,which is determined by its neighbors.After that,the particles are measured in a calculation basis.In terms of the measurement result,every attribute value of the corresponding data point is modified appropriately.In this way,each data point interacts with its neighbors and moves toward a certain center point.At last,this process is repeated several times until similar data points cluster together and form distinct classes.Simulation experiments on the synthetic and real world data demonstrate the effectiveness of the presented algorithm.Compared with some classical algorithms,the proposed algorithm achieves better clustering results.Moreover,combining quantum cluster assignment method,the presented algorithm can speed up the calculating velocity.
文摘The complexity of large-scale network systems made of a large number of nonlinearly interconnected components is a restrictive facet for their modeling and analysis. In this paper, we propose a framework of hierarchical modeling of a complex network system, based on a recursive unsupervised spectral clustering method. The hierarchical model serves the purpose of facilitating the management of complexity in the analysis of real-world critical infrastructures. We exemplify this by referring to the reliability analysis of the 380 kV Italian Power Transmission Network (IPTN). In this work of analysis, the classical component Importance Measures (IMs) of reliability theory have been extended to render them compatible and applicable to a complex distributed network system. By utilizing these extended IMs, the reliability properties of the IPTN system can be evaluated in the framework of the hierarchical system model, with the aim of providing risk managers with information on the risk/safety significance of system structures and components.
文摘Purpose: To discuss the problems arising from hierarchical cluster analysis of co-occurrence matrices in SPSS, and the corresponding solutions. Design/methodology/approach: We design different methods of using the SPSS hierarchical clustering module for co-occurrence matrices in order to compare these methods. We offer the correct syntax to deactivate the similarity algorithm for clustering analysis within the hierarchical clustering module of SPSS. Findings: When one inputs co-occurrence matrices into the data editor of the SPSS hierarchical clustering module without deactivating the embedded similarity algorithm, the program calculates similarity twice, and thus distorts and overestimates the degree of similarity. Practical implications: We offer the correct syntax to block the similarity algorithm for clustering analysis in the SPSS hierarchical clustering module in the case of co-occurrence matrices. This syntax enables researchers to avoid obtaining incorrect results. Originality/value: This paper presents a method of editing syntax to prevent the default use of a similarity algorithm for SPSS's hierarchical clustering module. This will help researchers, especially those from China, to properly implement the co-occurrence matrix when using SPSS for hierarchical cluster analysis, in order to provide more scientific and rational results.
基金supported by the National Natural Science Foundation of China(No.51575047)
文摘For the charging station construction of electric vehicle,location selecting is a key issue.There are two problems in location selection of the electric vehicle charging station.One is determining the location of charging station;the other is evaluating the location of charging station.To determine the charging station location,an spatial clustering algorithm is proposed and programmed.The example simulation shows the effectiveness of the spatial clustering algorithm.To evaluate the charging station location,a multi-hierarchical fuzzy method is proposed.Based on the location factors of electric vehicle charging station,the hierarchical evaluation structure of electric vehicle charging station location is constructed,including three levels,4first-class factors and 14second-class factors.The fuzzy multi-hierarchical evaluation model and algorithm are built.The analysis results show that the multi-hierarchical fuzzy method can reasonably complete the electric vehicle charging station location evaluation.
基金supported in part by NIH grants R01NS39600,U01MH114829RF1MH128693(to GAA)。
文摘Many fields,such as neuroscience,are experiencing the vast prolife ration of cellular data,underscoring the need fo r organizing and interpreting large datasets.A popular approach partitions data into manageable subsets via hierarchical clustering,but objective methods to determine the appropriate classification granularity are missing.We recently introduced a technique to systematically identify when to stop subdividing clusters based on the fundamental principle that cells must differ more between than within clusters.Here we present the corresponding protocol to classify cellular datasets by combining datadriven unsupervised hierarchical clustering with statistical testing.These general-purpose functions are applicable to any cellular dataset that can be organized as two-dimensional matrices of numerical values,including molecula r,physiological,and anatomical datasets.We demonstrate the protocol using cellular data from the Janelia MouseLight project to chara cterize morphological aspects of neurons.
文摘Hierarchical clustering analysis based on statistic s is one of the most important mining algorithms, but the traditionary hierarchica l clustering method is based on global comparing, which only takes in Q clusteri ng while ignoring R clustering in practice, so it has some limitation especially when the number of sample and index is very large. Furthermore, because of igno ring the association between the different indexes, the clustering result is not good & true. In this paper, we present the model and the algorithm of two-level hierarchi cal clustering which integrates Q clustering with R clustering. Moreover, becaus e two-level hierarchical clustering is based on the respective clustering resul t of each class, the classification of the indexes directly effects on the a ccuracy of the final clustering result, how to appropriately classify the inde xes is the chief and difficult problem we must handle in advance. Although some literatures also have referred to the issue of the classificati on of the indexes, but the articles classify the indexes only according to their superficial signification, which is unscientific. The reasons are as follow s: First, the superficial signification of some indexes usually takes on different meanings and it is easy to be misapprehended by different person. Furthermore, t his classification method seldom make use of history data, the classification re sult is not so objective. Second, for some indexes, its superficial signification didn’t show any mean ings, so simply from the superficial signification, we can’t classify them to c ertain classes. Third, this classification method need the users have higher level knowledge of this field, otherwise it is difficult for the users to understand the signifi cation of some indexes, which sometimes is not available. So in this paper, to this question, we first use R clustering method to cluste ring indexes, dividing p dimension indexes into q classes, then adopt two-level clustering method to get the final result. Obviously, the classification result is more objective and accurate. Moreover, after the first step, we can get the relation of the different indexes and their interaction. We can also know under a certain class indexes, which samples can be clustering to a class. (These semi finished results sometimes are very useful.) The experiments also indicates the effective and accurate of the algorithms. And, the result of R clustering ca n be easily used for the later practice.
文摘Axiomatization of Shannon entropy is a subject that has received lots of attention in the information theory literature.While Shannon entropy is defined on probability distribution,we define a new type of entropy on the set of partitions of finite subsets of metric spaces,which has a rich algebraic structure as a partially ordered set.We propose an axiomatization of an entropy-like measure of partitions of sets of objects located in metric spaces,and we derive an analytic expression of this new type of entropy referred to as inertial entropy.This approach starts with the notion of inertia of a partition and includes a study of the behavior of the sum of square errors of a partition.In this context,we characterize the chain of partitions produced by the Ward hierarchical clustering method.Starting from inertial entropies of partitions,we introduce conditional entropies which,in turn,generate metrics on partitions of finite sets.These metrics are used as external validation tools for clusterings of labeled data sets.The metric generated by inertial entropy can be used to validate data clustering for labeled data sets.This type of validation aims to determine to what extend labeling of the data coincides with the clustering obtained algorithmically,and we obtain a high degree of consistency of the data labeling with the results of several hierarchical clusterings.
文摘We propose two models in this paper. The concept of association model is put forward to obtain the co-occurrence relationships among keywords in the documents and the hierarchical Hamming clustering model is used to reduce the dimensionality of the category feature vector space which can solve the problem of the extremely high dimensionality of the documents' feature space. The results of experiment indicate that it can obtain the co-occurrence relations among key-words in the documents which promote the recall of classification system effectively. The hierarchical Hamming clustering model can reduce the dimensionality of the category feature vector efficiently, the size of the vector space is only about 10% of the primary dimensionality. Key words text classification - concept association - hierarchical clustering - hamming clustering CLC number TN 915. 08 Foundation item: Supporteded by the National 863 Project of China (2001AA142160, 2002AA145090)Biography: Su Gui-yang (1974-), male, Ph. D candidate, research direction: information filter and text classification.