In an automatic bobbin management system that simultaneously detects bobbin color and residual yarn,a composite texture segmentation and recognition operation based on an odd partial Gabor filter and multi-color space...In an automatic bobbin management system that simultaneously detects bobbin color and residual yarn,a composite texture segmentation and recognition operation based on an odd partial Gabor filter and multi-color space hierarchical clustering are proposed.Firstly,the parameter-optimized odd partial Gabor filter is used to distinguish bobbin and yarn texture,to explore Garbor parameters for yarn bobbins,and to accurately discriminate frequency characteristics of yarns and texture.Secondly,multi-color clustering segmentation using color spaces such as red,green,blue(RGB)and CIELUV(LUV)solves the problems of over-segmentation and segmentation errors,which are caused by the difficulty of accurately representing the complex and variable color information of yarns in a single-color space and the low contrast between the target and background.Finally,the segmented bobbin is combined with the odd partial Gabor’s edge recognition operator to further distinguish bobbin texture from yarn texture and locate the position and size of the residual yarn.Experimental results show that the method is robust in identifying complex texture,damaged and dyed bobbins,and multi-color yarns.Residual yarn identification can distinguish texture features and residual yarns well and it can be transferred to the detection and differentiation of complex texture,which is significantly better than traditional methods.展开更多
Graphical representation of hierarchical clustering results is of final importance in hierarchical cluster analysis of data. Unfortunately, almost all mathematical or statistical software may have a weak capability of...Graphical representation of hierarchical clustering results is of final importance in hierarchical cluster analysis of data. Unfortunately, almost all mathematical or statistical software may have a weak capability of showcasing such clustering results. Particularly, most of clustering results or trees drawn cannot be represented in a dendrogram with a resizable, rescalable and free-style fashion. With the “dynamic” drawing instead of “static” one, this research works around these weak functionalities that restrict visualization of clustering results in an arbitrary manner. It introduces an algorithmic solution to these functionalities, which adopts seamless pixel rearrangements to be able to resize and rescale dendrograms or tree diagrams. The results showed that the algorithm developed makes clustering outcome representation a really free visualization of hierarchical clustering and bioinformatics analysis. Especially, it possesses features of selectively visualizing and/or saving results in a specific size, scale and style (different views).展开更多
Hierarchical clustering analysis based on statistic s is one of the most important mining algorithms, but the traditionary hierarchica l clustering method is based on global comparing, which only takes in Q clusteri n...Hierarchical clustering analysis based on statistic s is one of the most important mining algorithms, but the traditionary hierarchica l clustering method is based on global comparing, which only takes in Q clusteri ng while ignoring R clustering in practice, so it has some limitation especially when the number of sample and index is very large. Furthermore, because of igno ring the association between the different indexes, the clustering result is not good & true. In this paper, we present the model and the algorithm of two-level hierarchi cal clustering which integrates Q clustering with R clustering. Moreover, becaus e two-level hierarchical clustering is based on the respective clustering resul t of each class, the classification of the indexes directly effects on the a ccuracy of the final clustering result, how to appropriately classify the inde xes is the chief and difficult problem we must handle in advance. Although some literatures also have referred to the issue of the classificati on of the indexes, but the articles classify the indexes only according to their superficial signification, which is unscientific. The reasons are as follow s: First, the superficial signification of some indexes usually takes on different meanings and it is easy to be misapprehended by different person. Furthermore, t his classification method seldom make use of history data, the classification re sult is not so objective. Second, for some indexes, its superficial signification didn’t show any mean ings, so simply from the superficial signification, we can’t classify them to c ertain classes. Third, this classification method need the users have higher level knowledge of this field, otherwise it is difficult for the users to understand the signifi cation of some indexes, which sometimes is not available. So in this paper, to this question, we first use R clustering method to cluste ring indexes, dividing p dimension indexes into q classes, then adopt two-level clustering method to get the final result. Obviously, the classification result is more objective and accurate. Moreover, after the first step, we can get the relation of the different indexes and their interaction. We can also know under a certain class indexes, which samples can be clustering to a class. (These semi finished results sometimes are very useful.) The experiments also indicates the effective and accurate of the algorithms. And, the result of R clustering ca n be easily used for the later practice.展开更多
For the load modeling of a large power grid,the large number of substations covered by it must be segregated into several categories and,thereafter,a load model built for each type.To address the problem of skewed clu...For the load modeling of a large power grid,the large number of substations covered by it must be segregated into several categories and,thereafter,a load model built for each type.To address the problem of skewed clustering tree in the classical hierarchical clustering method used for categorizing substations,a fair hierarchical clustering method is proposed in this paper.First,the fairness index is defined based on the Gini coefficient.Thereafter,a hierarchical clustering method is proposed based on the fairness index.Finally,the clustering results are evaluated using the contour coefficient and the t-SNE two-dimensional plane map.The substations clustering example of a real large power grid considered in this paper illustrates that the proposed fair hierarchical clustering method can effectively address the problem of the skewed clustering tree with high accuracy.展开更多
The rapid growth of modern mobile devices leads to a large number of distributed data,which is extremely valuable for learning models.Unfortunately,model training by collecting all these original data to a centralized...The rapid growth of modern mobile devices leads to a large number of distributed data,which is extremely valuable for learning models.Unfortunately,model training by collecting all these original data to a centralized cloud server is not applicable due to data privacy and communication costs concerns,hindering artificial intelligence from empowering mobile devices.Moreover,these data are not identically and independently distributed(Non-IID)caused by their different context,which will deteriorate the performance of the model.To address these issues,we propose a novel Distributed Learning algorithm based on hierarchical clustering and Adaptive Dataset Condensation,named ADC-DL,which learns a shared model by collecting the synthetic samples generated on each device.To tackle the heterogeneity of data distribution,we propose an entropy topsis comprehensive tiering model for hierarchical clustering,which distinguishes clients in terms of their data characteristics.Subsequently,synthetic dummy samples are generated based on the hierarchical structure utilizing adaptive dataset condensation.The procedure of dataset condensation can be adjusted adaptively according to the tier of the client.Extensive experiments demonstrate that the performance of our ADC-DL is more outstanding in prediction accuracy and communication costs compared with existing algorithms.展开更多
Network topology inference is one of the important applications of network tomography.Traditional network topology inference may impact network normal operation due to its generation of huge data traffic.A unicast net...Network topology inference is one of the important applications of network tomography.Traditional network topology inference may impact network normal operation due to its generation of huge data traffic.A unicast network topology inference is proposed to use time to live(TTL)for layering and classify nodes layer by layer based on the similarity of node pairs.Finally,the method infers logical network topology effectively with self-adaptive combination of previous results.Simulation results show that the proposed method holds a high accuracy of topology inference while decreasing network measuring flow,thus improves measurement efficiency.展开更多
It is a challenging topic to develop an efficient algorithm for large scale classification problems in many applications of machine learning. In this paper, a hierarchical clustering and fixed-layer local learning (HC...It is a challenging topic to develop an efficient algorithm for large scale classification problems in many applications of machine learning. In this paper, a hierarchical clustering and fixed-layer local learning (HCFLL) based support vector machine(SVM) algorithm is proposed to deal with this problem. Firstly, HCFLL hierarchically clusters a given dataset into a modified clustering feature tree based on the ideas of unsupervised clustering and supervised clustering. Then it locally trains SVM on each labeled subtree at a fixed-layer of the tree. The experimental results show that compared with the existing popular algorithms such as core vector machine and decision-tree support vector machine, HCFLL can significantly improve the training and testing speeds with comparable testing accuracy.展开更多
Banana is an important crop grown in Oman and there is a dearth of information on its genetic diversity to assist in crop breeding and improvement programs.This study employed amplified fragment length polymorphism(AF...Banana is an important crop grown in Oman and there is a dearth of information on its genetic diversity to assist in crop breeding and improvement programs.This study employed amplified fragment length polymorphism(AFLP) to investigate the genetic variation in local banana cultivars from the southern region of Oman.Using 12 primer combinations,a total of 1094 bands were scored,of which 1012 were polymorphic.Eighty-two unique markers were identified,which revealed the distinct separation of the seven cultivars.The results obtained show that AFLP can be used to differentiate the banana cultivars.Further classification by phylogenetic,hierarchical clustering and principal component analyses showed significant differences between the clusters found with molecular markers and those clusters created by previous studies using morphological analysis.Based on the analytical results,a consensus dendrogram of the banana cultivars is presented.展开更多
News feed is one of the potential information providing sources which give updates on various topics of different domains.These updates on various topics need to be collected since the domain specific interested users...News feed is one of the potential information providing sources which give updates on various topics of different domains.These updates on various topics need to be collected since the domain specific interested users are in need of important updates in their domains with organized data from various sources.In this paper,the news summarization system is proposed for the news data streams from RSS feeds and Google news.Since news stream analysis requires live content,the news data are continuously collected for our experimentation.Themajor contributions of thiswork involve domain corpus based news collection,news content extraction,hierarchical clustering of the news and summarization of news.Many of the existing news summarization systems lack in providing dynamic content with domain wise representation.This is alleviated in our proposed systemby tagging the news feed with domain corpuses and organizing the news streams with the hierarchical structure with topic wise representation.Further,the news streams are summarized for the users with a novel summarization algorithm.The proposed summarization system generates topic wise summaries effectively for the user and no system in the literature has handled the news summarization by collecting the data dynamically and organizing the content hierarchically.The proposed system is compared with existing systems and achieves better results in generating news summaries.The Online news content editors are highly benefitted by this system for instantly getting the news summaries of their domain interest.展开更多
Purpose-Developing algorithms for automated detection and tracking of multiple objects is one challenge in the field of object tracking.Especially in a traffic video monitoring system,vehicle detection is an essential...Purpose-Developing algorithms for automated detection and tracking of multiple objects is one challenge in the field of object tracking.Especially in a traffic video monitoring system,vehicle detection is an essential and challenging task.In the previous studies,many vehicle detection methods have been presented.These proposed approaches mostly used either motion information or characteristic information to detect vehicles.Although these methods are effective in detecting vehicles,their detection accuracy still needs to be improved.Moreover,the headlights and windshields,which are used as the vehicle features for detection in these methods,are easily obscured in some traffic conditions.The paper aims to discuss these issues.Design/methodology/approach-First,each frame will be captured from a video sequence and then the background subtraction is performed by using the Mixture-of-Gaussians background model.Next,the Shi-Tomasi corner detection method is employed to extract the feature points from objects of interest in each foreground scene and the hierarchical clustering approach is then applied to cluster and form them into feature blocks.These feature blocks will be used to track the moving objects frame by frame.Findings-Using the proposed method,it is possible to detect the vehicles in both day-time and night-time scenarios with a 95 percent accuracy rate and can cope with irrelevant movement(waving trees),which has to be deemed as background.In addition,the proposed method is able to deal with different vehicle shapes such as cars,vans,and motorcycles.Originality/value-This paper presents a hierarchical clustering of features approach for multiple vehicles tracking in traffic environments to improve the capability of detection and tracking in case that the vehicle features are obscured in some traffic conditions.展开更多
As an important branch of machine learning,clustering analysis is widely used in some fields,e.g.,image pattern recognition,social network analysis,information security,and so on.In this paper,we consider the designin...As an important branch of machine learning,clustering analysis is widely used in some fields,e.g.,image pattern recognition,social network analysis,information security,and so on.In this paper,we consider the designing of clustering algorithm in quantum scenario,and propose a quantum hierarchical agglomerative clustering algorithm,which is based on one dimension discrete quantum walk with single-point phase defects.In the proposed algorithm,two nonclassical characters of this kind of quantum walk,localization and ballistic effects,are exploited.At first,each data point is viewed as a particle and performed this kind of quantum walk with a parameter,which is determined by its neighbors.After that,the particles are measured in a calculation basis.In terms of the measurement result,every attribute value of the corresponding data point is modified appropriately.In this way,each data point interacts with its neighbors and moves toward a certain center point.At last,this process is repeated several times until similar data points cluster together and form distinct classes.Simulation experiments on the synthetic and real world data demonstrate the effectiveness of the presented algorithm.Compared with some classical algorithms,the proposed algorithm achieves better clustering results.Moreover,combining quantum cluster assignment method,the presented algorithm can speed up the calculating velocity.展开更多
Clustering is the main method of deinterleaving of radar pulse using multi-parameter.However,the problem in clustering of radar pulses lies in finding the right number of clusters.To solve this problem,a method is pro...Clustering is the main method of deinterleaving of radar pulse using multi-parameter.However,the problem in clustering of radar pulses lies in finding the right number of clusters.To solve this problem,a method is proposed based on Self-Organizing Feature Maps(SOFM) and Composed Density between and within clusters(CDbw).This method firstly extracts the feature of Direction Of Arrival(DOA) data by SOFM using the characteristic of DOA parameter,and then cluster of SOFM.Through computing the cluster validity index CDbw,the right number of clusters is found.The results of simulation show that the method is effective in sorting the data of DOA.展开更多
Social networking sites in the most modernized world are flooded with large data volumes.Extracting the sentiment polarity of important aspects is necessary;as it helps to determine people’s opinions through what the...Social networking sites in the most modernized world are flooded with large data volumes.Extracting the sentiment polarity of important aspects is necessary;as it helps to determine people’s opinions through what they write.The Coronavirus pandemic has invaded the world and been given a mention in the social media on a large scale.In a very short period of time,tweets indicate unpredicted increase of coronavirus.They reflect people’s opinions and thoughts with regard to coronavirus and its impact on society.The research community has been interested in discovering the hidden relationships from short texts such as Twitter and Weiboa;due to their shortness and sparsity.In this paper,a hierarchical twitter sentiment model(HTSM)is proposed to show people’s opinions in short texts.The proposed HTSM has two main features as follows:constructing a hierarchical tree of important aspects from short texts without a predefined hierarchy depth and width,as well as analyzing the extracted opinions to discover the sentiment polarity on those important aspects by applying a valence aware dictionary for sentiment reasoner(VADER)sentiment analysis.The tweets for each extracted important aspect can be categorized as follows:strongly positive,positive,neutral,strongly negative,or negative.The quality of the proposed model is validated by applying it to a popular product and a widespread topic.The results show that the proposed model outperforms the state-of-the-art methods used in analyzing people’s opinions in short text effectively.展开更多
A new algorithm named kernel bisecting k-means and sample removal(KBK-SR) is proposed as sampling preprocessing for support vector machine(SVM) training to improve the efficiency.The proposed algorithm tends to quickl...A new algorithm named kernel bisecting k-means and sample removal(KBK-SR) is proposed as sampling preprocessing for support vector machine(SVM) training to improve the efficiency.The proposed algorithm tends to quickly produce balanced clusters of similar sizes in the kernel feature space,which makes it efficient and effective for reducing training samples.Theoretical analysis and experimental results on three UCI real data benchmarks both show that,with very short sampling time,the proposed algorithm dramatically accelerates SVM sampling and training while maintaining high test accuracy.展开更多
The clustering of objects(individuals or variables)is one of the most used approaches to exploring multivariate data.The two most common unsupervised clustering strategies are hierarchical ascending clustering(HAC)and...The clustering of objects(individuals or variables)is one of the most used approaches to exploring multivariate data.The two most common unsupervised clustering strategies are hierarchical ascending clustering(HAC)and k-means partitioning used to identify groups of similar objects in a dataset to divide it into homogeneous groups.The proposed topological clustering of variables,called TCV,studies an homogeneous set of variables defined on the same set of individuals,based on the notion of neighborhood graphs,some of these variables are more-or-less correlated or linked according to the type quantitative or qualitative of the variables.This topological data analysis approach can then be useful for dimension reduction and variable selection.It’s a topological hierarchical clustering analysis of a set of variables which can be quantitative,qualitative or a mixture of both.It arranges variables into homogeneous groups according to their correlations or associations studied in a topological context of principal component analysis(PCA)or multiple correspondence analysis(MCA).The proposed TCV is adapted to the type of data considered,its principle is presented and illustrated using simple real datasets with quantitative,qualitative and mixed variables.The results of these illustrative examples are compared to those of other variables clustering approaches.展开更多
The fruits of leguminous plants Cercis Chinensis Bunge are still overlooked although they have been reported to be antioxidative because of the limited information on the phytochemicals of C.chinensis fruits.A simple,...The fruits of leguminous plants Cercis Chinensis Bunge are still overlooked although they have been reported to be antioxidative because of the limited information on the phytochemicals of C.chinensis fruits.A simple,rapid and sensitive HPLC-MS/MS method was developed for the identification and quantitation of the major bioactive components in C.chinensis fruits.Eighteen polyphenols were identified,which are first reported in C.chinensis fruits.Moreover,ten components were simultaneously quantified.The validated quantitative method was proved to be sensitive,reproducible and accurate.Then,it was applied to analyze batches of C.chinensis fruits from different phytomorph and areas.The principal components analysis(PCA)realized visualization and reduction of data set dimension while the hierarchical cluster analysis(HCA)indicated that the content of phenolic acids or all ten components might be used to differentiate C.chinensis fruits of different phytomorph.展开更多
Clustering is a group of unsupervised statistical techniques commonly used in many disciplines. Considering their applications to fish abundance data, many technical details need to be considered to ensure reasonable ...Clustering is a group of unsupervised statistical techniques commonly used in many disciplines. Considering their applications to fish abundance data, many technical details need to be considered to ensure reasonable interpretation. However, the reliability and stability of the clustering methods have rarely been studied in the contexts of fisheries. This study presents an intensive evaluation of three common clustering methods, including hierarchical clustering(HC), K-means(KM), and expectation-maximization(EM) methods, based on fish community surveys in the coastal waters of Shandong, China. We evaluated the performances of these three methods considering different numbers of clusters, data size, and data transformation approaches, focusing on the consistency validation using the index of average proportion of non-overlap(APN). The results indicate that the three methods tend to be inconsistent in the optimal number of clusters. EM showed relatively better performances to avoid unbalanced classification, whereas HC and KM provided more stable clustering results. Data transformation including scaling, square-root, and log-transformation had substantial influences on the clustering results, especially for KM. Moreover, transformation also influenced clustering stability, wherein scaling tended to provide a stable solution at the same number of clusters. The APN values indicated improved stability with increasing data size, and the effect leveled off over 70 samples in general and most quickly in EM. We conclude that the best clustering method can be chosen depending on the aim of the study and the number of clusters. In general, KM is relatively robust in our tests. We also provide recommendations for future application of clustering analyses. This study is helpful to ensure the credibility of the application and interpretation of clustering methods.展开更多
The controller is indispensable in software-defined networking(SDN).With several features,controllers monitor the network and respond promptly to dynamic changes.Their performance affects the quality-of-service(QoS)in...The controller is indispensable in software-defined networking(SDN).With several features,controllers monitor the network and respond promptly to dynamic changes.Their performance affects the quality-of-service(QoS)in SDN.Every controller supports a set of features.However,the support of the features may be more prominent in one controller.Moreover,a single controller leads to performance,single-point-of-failure(SPOF),and scalability problems.To overcome this,a controller with an optimum feature set must be available for SDN.Furthermore,a cluster of optimum feature set controllers will overcome an SPOF and improve the QoS in SDN.Herein,leveraging an analytical network process(ANP),we rank SDN controllers regarding their supporting features and create a hierarchical control plane based cluster(HCPC)of the highly ranked controller computed using the ANP,evaluating their performance for the OS3E topology.The results demonstrated in Mininet reveal that a HCPC environment with an optimum controller achieves an improved QoS.Moreover,the experimental results validated in Mininet show that our proposed approach surpasses the existing distributed controller clustering(DCC)schemes in terms of several performance metrics i.e.,delay,jitter,throughput,load balancing,scalability and CPU(central processing unit)utilization.展开更多
[Objectives]To explore the compatibility rules of neonatal parenteral nutrition(PN)prescriptions based on association rules and hierarchical cluster analysis,thereby providing a reference for standardizing neonatal pa...[Objectives]To explore the compatibility rules of neonatal parenteral nutrition(PN)prescriptions based on association rules and hierarchical cluster analysis,thereby providing a reference for standardizing neonatal parenteral nutrition supportive therapy.[Methods]The data about neonatal PN formulations prepared by the Pharmacy Intravenous Admixture Services(PIVAS)of the Affiliated Hospital of Chengde Medical University from July 2015 to June 2021 were collected.The general information of the prescriptions and the frequency of drug use were analyzed with Excel 2019;the boxplot of drug dosing was drawn using GraphPad 8.0 software;and SPSS Modeler 18.0 and SPSS Statistics 26.0 were used to perform association rules and hierarchical cluster analysis.[Results]A total of 11488 PN prescriptions were collected from 1421 newborns,involving 18 kinds of drugs,which were divided into 11 types of nutrients.Association rules analysis yielded 84 nutrient substance combinations.The combination of fat emulsion-water-soluble vitamins-fat-soluble vitamins-glucose-amino acids had the highest confidence(99.95%).The hierarchical cluster analysis divided nutrients into 5 types.[Conclusions]The prescriptions of PN for newborns were composed of five types of nutrients:amino acids,fat emulsion,glucose,water-soluble vitamins,and fat-soluble vitamins.According to the lack of electrolytes and trace elements,appropriate drugs can be chosen to meet nutritional demands.This study provides reference basis for reasonable selection of drugs for neonatal PN prescriptions and further standardization of PN supportive therapy in newborns.展开更多
Compared with the pair-wise registration of point clouds,multi-view point cloud registration is much less studied.In this dissertation,a disordered multi-view point cloud registration method based on the soft trimmed ...Compared with the pair-wise registration of point clouds,multi-view point cloud registration is much less studied.In this dissertation,a disordered multi-view point cloud registration method based on the soft trimmed deep network is proposed.In this method,firstly,the expression ability of feature extraction module is improved and the registration accuracy is increased by enhancing feature extraction network with the point pair feature.Secondly,neighborhood and angle similarities are used to measure the consistency of candidate points to surrounding neighborhoods.By combining distance consistency and high dimensional feature consistency,our network introduces the confidence estimation module of registration,so the point cloud trimmed problem can be converted to candidate for the degree of confidence estimation problem,achieving the pair-wise registration of partially overlapping point clouds.Thirdly,the results from pair-wise registration are fed into the model fusion to achieve the rough registration of multi-view point clouds.Finally,the hierarchical clustering is used to iteratively optimize the clustering center model by gradually increasing the number of clustering categories and performing clustering and registration alternately.This method achieves rough point cloud registration quickly in the early stage,improves the accuracy of multi-view point cloud registration in the later stage,and makes full use of global information to achieve robust and accurate multi-view registration without initial value.展开更多
基金Key Research and Development Plan of Shaanxi Province,China(No.2023-YBGY-330)。
文摘In an automatic bobbin management system that simultaneously detects bobbin color and residual yarn,a composite texture segmentation and recognition operation based on an odd partial Gabor filter and multi-color space hierarchical clustering are proposed.Firstly,the parameter-optimized odd partial Gabor filter is used to distinguish bobbin and yarn texture,to explore Garbor parameters for yarn bobbins,and to accurately discriminate frequency characteristics of yarns and texture.Secondly,multi-color clustering segmentation using color spaces such as red,green,blue(RGB)and CIELUV(LUV)solves the problems of over-segmentation and segmentation errors,which are caused by the difficulty of accurately representing the complex and variable color information of yarns in a single-color space and the low contrast between the target and background.Finally,the segmented bobbin is combined with the odd partial Gabor’s edge recognition operator to further distinguish bobbin texture from yarn texture and locate the position and size of the residual yarn.Experimental results show that the method is robust in identifying complex texture,damaged and dyed bobbins,and multi-color yarns.Residual yarn identification can distinguish texture features and residual yarns well and it can be transferred to the detection and differentiation of complex texture,which is significantly better than traditional methods.
文摘Graphical representation of hierarchical clustering results is of final importance in hierarchical cluster analysis of data. Unfortunately, almost all mathematical or statistical software may have a weak capability of showcasing such clustering results. Particularly, most of clustering results or trees drawn cannot be represented in a dendrogram with a resizable, rescalable and free-style fashion. With the “dynamic” drawing instead of “static” one, this research works around these weak functionalities that restrict visualization of clustering results in an arbitrary manner. It introduces an algorithmic solution to these functionalities, which adopts seamless pixel rearrangements to be able to resize and rescale dendrograms or tree diagrams. The results showed that the algorithm developed makes clustering outcome representation a really free visualization of hierarchical clustering and bioinformatics analysis. Especially, it possesses features of selectively visualizing and/or saving results in a specific size, scale and style (different views).
文摘Hierarchical clustering analysis based on statistic s is one of the most important mining algorithms, but the traditionary hierarchica l clustering method is based on global comparing, which only takes in Q clusteri ng while ignoring R clustering in practice, so it has some limitation especially when the number of sample and index is very large. Furthermore, because of igno ring the association between the different indexes, the clustering result is not good & true. In this paper, we present the model and the algorithm of two-level hierarchi cal clustering which integrates Q clustering with R clustering. Moreover, becaus e two-level hierarchical clustering is based on the respective clustering resul t of each class, the classification of the indexes directly effects on the a ccuracy of the final clustering result, how to appropriately classify the inde xes is the chief and difficult problem we must handle in advance. Although some literatures also have referred to the issue of the classificati on of the indexes, but the articles classify the indexes only according to their superficial signification, which is unscientific. The reasons are as follow s: First, the superficial signification of some indexes usually takes on different meanings and it is easy to be misapprehended by different person. Furthermore, t his classification method seldom make use of history data, the classification re sult is not so objective. Second, for some indexes, its superficial signification didn’t show any mean ings, so simply from the superficial signification, we can’t classify them to c ertain classes. Third, this classification method need the users have higher level knowledge of this field, otherwise it is difficult for the users to understand the signifi cation of some indexes, which sometimes is not available. So in this paper, to this question, we first use R clustering method to cluste ring indexes, dividing p dimension indexes into q classes, then adopt two-level clustering method to get the final result. Obviously, the classification result is more objective and accurate. Moreover, after the first step, we can get the relation of the different indexes and their interaction. We can also know under a certain class indexes, which samples can be clustering to a class. (These semi finished results sometimes are very useful.) The experiments also indicates the effective and accurate of the algorithms. And, the result of R clustering ca n be easily used for the later practice.
基金supported by the Major Science and Technology Project of Yunnan Province entitled“Research and Application of Key Technologies of Power Grid Operation Analysis and Protection Control for Improving Green Power Consumption”(202002AF080001)the China South Power Grid Science and Technology Project entitled“Research on Load Model and Modeling Method of Yunnan Power Grid”(YNKJXM20180017).
文摘For the load modeling of a large power grid,the large number of substations covered by it must be segregated into several categories and,thereafter,a load model built for each type.To address the problem of skewed clustering tree in the classical hierarchical clustering method used for categorizing substations,a fair hierarchical clustering method is proposed in this paper.First,the fairness index is defined based on the Gini coefficient.Thereafter,a hierarchical clustering method is proposed based on the fairness index.Finally,the clustering results are evaluated using the contour coefficient and the t-SNE two-dimensional plane map.The substations clustering example of a real large power grid considered in this paper illustrates that the proposed fair hierarchical clustering method can effectively address the problem of the skewed clustering tree with high accuracy.
基金the General Program of National Natural Science Foundation of China(62072049).
文摘The rapid growth of modern mobile devices leads to a large number of distributed data,which is extremely valuable for learning models.Unfortunately,model training by collecting all these original data to a centralized cloud server is not applicable due to data privacy and communication costs concerns,hindering artificial intelligence from empowering mobile devices.Moreover,these data are not identically and independently distributed(Non-IID)caused by their different context,which will deteriorate the performance of the model.To address these issues,we propose a novel Distributed Learning algorithm based on hierarchical clustering and Adaptive Dataset Condensation,named ADC-DL,which learns a shared model by collecting the synthetic samples generated on each device.To tackle the heterogeneity of data distribution,we propose an entropy topsis comprehensive tiering model for hierarchical clustering,which distinguishes clients in terms of their data characteristics.Subsequently,synthetic dummy samples are generated based on the hierarchical structure utilizing adaptive dataset condensation.The procedure of dataset condensation can be adjusted adaptively according to the tier of the client.Extensive experiments demonstrate that the performance of our ADC-DL is more outstanding in prediction accuracy and communication costs compared with existing algorithms.
基金supported by the National Natural Science Foundation of China (Nos.61373137,61373017, 61373139)the Major Program of Jiangsu Higher Education Institutions (No.14KJA520002)+1 种基金the Six Industries Talent Peaks Plan of Jiangsu(No.2013-DZXX-014)the Jiangsu Qinglan Project
文摘Network topology inference is one of the important applications of network tomography.Traditional network topology inference may impact network normal operation due to its generation of huge data traffic.A unicast network topology inference is proposed to use time to live(TTL)for layering and classify nodes layer by layer based on the similarity of node pairs.Finally,the method infers logical network topology effectively with self-adaptive combination of previous results.Simulation results show that the proposed method holds a high accuracy of topology inference while decreasing network measuring flow,thus improves measurement efficiency.
基金National Natural Science Foundation of China ( No. 61070033 )Fundamental Research Funds for the Central Universities,China( No. 2012ZM0061)
文摘It is a challenging topic to develop an efficient algorithm for large scale classification problems in many applications of machine learning. In this paper, a hierarchical clustering and fixed-layer local learning (HCFLL) based support vector machine(SVM) algorithm is proposed to deal with this problem. Firstly, HCFLL hierarchically clusters a given dataset into a modified clustering feature tree based on the ideas of unsupervised clustering and supervised clustering. Then it locally trains SVM on each labeled subtree at a fixed-layer of the tree. The experimental results show that compared with the existing popular algorithms such as core vector machine and decision-tree support vector machine, HCFLL can significantly improve the training and testing speeds with comparable testing accuracy.
基金Project supported by Programs of Sultan Qaboos University (Nos SR/AGR/BIOR/05/01 and IG/AGR/PLANT/04/01),Sultanate of Oman,and the Research Chair in Postharvest Technology at the University of Stellenbosch,South Africa
文摘Banana is an important crop grown in Oman and there is a dearth of information on its genetic diversity to assist in crop breeding and improvement programs.This study employed amplified fragment length polymorphism(AFLP) to investigate the genetic variation in local banana cultivars from the southern region of Oman.Using 12 primer combinations,a total of 1094 bands were scored,of which 1012 were polymorphic.Eighty-two unique markers were identified,which revealed the distinct separation of the seven cultivars.The results obtained show that AFLP can be used to differentiate the banana cultivars.Further classification by phylogenetic,hierarchical clustering and principal component analyses showed significant differences between the clusters found with molecular markers and those clusters created by previous studies using morphological analysis.Based on the analytical results,a consensus dendrogram of the banana cultivars is presented.
文摘News feed is one of the potential information providing sources which give updates on various topics of different domains.These updates on various topics need to be collected since the domain specific interested users are in need of important updates in their domains with organized data from various sources.In this paper,the news summarization system is proposed for the news data streams from RSS feeds and Google news.Since news stream analysis requires live content,the news data are continuously collected for our experimentation.Themajor contributions of thiswork involve domain corpus based news collection,news content extraction,hierarchical clustering of the news and summarization of news.Many of the existing news summarization systems lack in providing dynamic content with domain wise representation.This is alleviated in our proposed systemby tagging the news feed with domain corpuses and organizing the news streams with the hierarchical structure with topic wise representation.Further,the news streams are summarized for the users with a novel summarization algorithm.The proposed summarization system generates topic wise summaries effectively for the user and no system in the literature has handled the news summarization by collecting the data dynamically and organizing the content hierarchically.The proposed system is compared with existing systems and achieves better results in generating news summaries.The Online news content editors are highly benefitted by this system for instantly getting the news summaries of their domain interest.
文摘Purpose-Developing algorithms for automated detection and tracking of multiple objects is one challenge in the field of object tracking.Especially in a traffic video monitoring system,vehicle detection is an essential and challenging task.In the previous studies,many vehicle detection methods have been presented.These proposed approaches mostly used either motion information or characteristic information to detect vehicles.Although these methods are effective in detecting vehicles,their detection accuracy still needs to be improved.Moreover,the headlights and windshields,which are used as the vehicle features for detection in these methods,are easily obscured in some traffic conditions.The paper aims to discuss these issues.Design/methodology/approach-First,each frame will be captured from a video sequence and then the background subtraction is performed by using the Mixture-of-Gaussians background model.Next,the Shi-Tomasi corner detection method is employed to extract the feature points from objects of interest in each foreground scene and the hierarchical clustering approach is then applied to cluster and form them into feature blocks.These feature blocks will be used to track the moving objects frame by frame.Findings-Using the proposed method,it is possible to detect the vehicles in both day-time and night-time scenarios with a 95 percent accuracy rate and can cope with irrelevant movement(waving trees),which has to be deemed as background.In addition,the proposed method is able to deal with different vehicle shapes such as cars,vans,and motorcycles.Originality/value-This paper presents a hierarchical clustering of features approach for multiple vehicles tracking in traffic environments to improve the capability of detection and tracking in case that the vehicle features are obscured in some traffic conditions.
基金This work was supported by National Natural Science Foundation of China(Grants Nos.61976053 and 61772134)Fujian Province Natural Science Foundation(Grant No.2018J01776)+1 种基金Program for New Century Excellent Talents in Fujian Province University,Probability and Statistics:Theory and Application(Grant No.IRTL1704)the Program for Innovative Research Team in Science and Technology in Fujian Province University.
文摘As an important branch of machine learning,clustering analysis is widely used in some fields,e.g.,image pattern recognition,social network analysis,information security,and so on.In this paper,we consider the designing of clustering algorithm in quantum scenario,and propose a quantum hierarchical agglomerative clustering algorithm,which is based on one dimension discrete quantum walk with single-point phase defects.In the proposed algorithm,two nonclassical characters of this kind of quantum walk,localization and ballistic effects,are exploited.At first,each data point is viewed as a particle and performed this kind of quantum walk with a parameter,which is determined by its neighbors.After that,the particles are measured in a calculation basis.In terms of the measurement result,every attribute value of the corresponding data point is modified appropriately.In this way,each data point interacts with its neighbors and moves toward a certain center point.At last,this process is repeated several times until similar data points cluster together and form distinct classes.Simulation experiments on the synthetic and real world data demonstrate the effectiveness of the presented algorithm.Compared with some classical algorithms,the proposed algorithm achieves better clustering results.Moreover,combining quantum cluster assignment method,the presented algorithm can speed up the calculating velocity.
文摘Clustering is the main method of deinterleaving of radar pulse using multi-parameter.However,the problem in clustering of radar pulses lies in finding the right number of clusters.To solve this problem,a method is proposed based on Self-Organizing Feature Maps(SOFM) and Composed Density between and within clusters(CDbw).This method firstly extracts the feature of Direction Of Arrival(DOA) data by SOFM using the characteristic of DOA parameter,and then cluster of SOFM.Through computing the cluster validity index CDbw,the right number of clusters is found.The results of simulation show that the method is effective in sorting the data of DOA.
基金This research was supported by Korea Institute for Advancement of Technology(KIAT)grant funded by the Korea Government(MOTIE)(P0012724,The Competency Development Program for Industry Specialist)and the Soonchunhyang University Research Fund.
文摘Social networking sites in the most modernized world are flooded with large data volumes.Extracting the sentiment polarity of important aspects is necessary;as it helps to determine people’s opinions through what they write.The Coronavirus pandemic has invaded the world and been given a mention in the social media on a large scale.In a very short period of time,tweets indicate unpredicted increase of coronavirus.They reflect people’s opinions and thoughts with regard to coronavirus and its impact on society.The research community has been interested in discovering the hidden relationships from short texts such as Twitter and Weiboa;due to their shortness and sparsity.In this paper,a hierarchical twitter sentiment model(HTSM)is proposed to show people’s opinions in short texts.The proposed HTSM has two main features as follows:constructing a hierarchical tree of important aspects from short texts without a predefined hierarchy depth and width,as well as analyzing the extracted opinions to discover the sentiment polarity on those important aspects by applying a valence aware dictionary for sentiment reasoner(VADER)sentiment analysis.The tweets for each extracted important aspect can be categorized as follows:strongly positive,positive,neutral,strongly negative,or negative.The quality of the proposed model is validated by applying it to a popular product and a widespread topic.The results show that the proposed model outperforms the state-of-the-art methods used in analyzing people’s opinions in short text effectively.
基金National Natural Science Foundation of China (No. 60975083)Key Grant Project,Ministry of Education,China(No. 104145)
文摘A new algorithm named kernel bisecting k-means and sample removal(KBK-SR) is proposed as sampling preprocessing for support vector machine(SVM) training to improve the efficiency.The proposed algorithm tends to quickly produce balanced clusters of similar sizes in the kernel feature space,which makes it efficient and effective for reducing training samples.Theoretical analysis and experimental results on three UCI real data benchmarks both show that,with very short sampling time,the proposed algorithm dramatically accelerates SVM sampling and training while maintaining high test accuracy.
文摘The clustering of objects(individuals or variables)is one of the most used approaches to exploring multivariate data.The two most common unsupervised clustering strategies are hierarchical ascending clustering(HAC)and k-means partitioning used to identify groups of similar objects in a dataset to divide it into homogeneous groups.The proposed topological clustering of variables,called TCV,studies an homogeneous set of variables defined on the same set of individuals,based on the notion of neighborhood graphs,some of these variables are more-or-less correlated or linked according to the type quantitative or qualitative of the variables.This topological data analysis approach can then be useful for dimension reduction and variable selection.It’s a topological hierarchical clustering analysis of a set of variables which can be quantitative,qualitative or a mixture of both.It arranges variables into homogeneous groups according to their correlations or associations studied in a topological context of principal component analysis(PCA)or multiple correspondence analysis(MCA).The proposed TCV is adapted to the type of data considered,its principle is presented and illustrated using simple real datasets with quantitative,qualitative and mixed variables.The results of these illustrative examples are compared to those of other variables clustering approaches.
基金supported by the National Natural Science Foundation of China(Grant Nos.82073808,81872828,and 81573384)。
文摘The fruits of leguminous plants Cercis Chinensis Bunge are still overlooked although they have been reported to be antioxidative because of the limited information on the phytochemicals of C.chinensis fruits.A simple,rapid and sensitive HPLC-MS/MS method was developed for the identification and quantitation of the major bioactive components in C.chinensis fruits.Eighteen polyphenols were identified,which are first reported in C.chinensis fruits.Moreover,ten components were simultaneously quantified.The validated quantitative method was proved to be sensitive,reproducible and accurate.Then,it was applied to analyze batches of C.chinensis fruits from different phytomorph and areas.The principal components analysis(PCA)realized visualization and reduction of data set dimension while the hierarchical cluster analysis(HCA)indicated that the content of phenolic acids or all ten components might be used to differentiate C.chinensis fruits of different phytomorph.
基金provided by the Marine S&T Fund of Shandong Province for Pilot National Laboratory for Marine Science and Technology (Qingdao) (No.2018SDKJ0501-2)。
文摘Clustering is a group of unsupervised statistical techniques commonly used in many disciplines. Considering their applications to fish abundance data, many technical details need to be considered to ensure reasonable interpretation. However, the reliability and stability of the clustering methods have rarely been studied in the contexts of fisheries. This study presents an intensive evaluation of three common clustering methods, including hierarchical clustering(HC), K-means(KM), and expectation-maximization(EM) methods, based on fish community surveys in the coastal waters of Shandong, China. We evaluated the performances of these three methods considering different numbers of clusters, data size, and data transformation approaches, focusing on the consistency validation using the index of average proportion of non-overlap(APN). The results indicate that the three methods tend to be inconsistent in the optimal number of clusters. EM showed relatively better performances to avoid unbalanced classification, whereas HC and KM provided more stable clustering results. Data transformation including scaling, square-root, and log-transformation had substantial influences on the clustering results, especially for KM. Moreover, transformation also influenced clustering stability, wherein scaling tended to provide a stable solution at the same number of clusters. The APN values indicated improved stability with increasing data size, and the effect leveled off over 70 samples in general and most quickly in EM. We conclude that the best clustering method can be chosen depending on the aim of the study and the number of clusters. In general, KM is relatively robust in our tests. We also provide recommendations for future application of clustering analyses. This study is helpful to ensure the credibility of the application and interpretation of clustering methods.
基金supported by the MSIT(Ministry of Science and ICT),Korea,under the ITRC(Information Technology Research Center)support program(IITP-2020-2018-0-01431)supervised by the IITP(Institute for Information&Communications Technology Planning&Evaluation).
文摘The controller is indispensable in software-defined networking(SDN).With several features,controllers monitor the network and respond promptly to dynamic changes.Their performance affects the quality-of-service(QoS)in SDN.Every controller supports a set of features.However,the support of the features may be more prominent in one controller.Moreover,a single controller leads to performance,single-point-of-failure(SPOF),and scalability problems.To overcome this,a controller with an optimum feature set must be available for SDN.Furthermore,a cluster of optimum feature set controllers will overcome an SPOF and improve the QoS in SDN.Herein,leveraging an analytical network process(ANP),we rank SDN controllers regarding their supporting features and create a hierarchical control plane based cluster(HCPC)of the highly ranked controller computed using the ANP,evaluating their performance for the OS3E topology.The results demonstrated in Mininet reveal that a HCPC environment with an optimum controller achieves an improved QoS.Moreover,the experimental results validated in Mininet show that our proposed approach surpasses the existing distributed controller clustering(DCC)schemes in terms of several performance metrics i.e.,delay,jitter,throughput,load balancing,scalability and CPU(central processing unit)utilization.
基金Supported by Science and Technology Research and Development Project of Chengde City,Hebei Province(201706A043)Young Scholar Program of Hebei Pharmaceutical Association Hospital Pharmaceutical Research Project(2020—Hbsyxhqn0029).
文摘[Objectives]To explore the compatibility rules of neonatal parenteral nutrition(PN)prescriptions based on association rules and hierarchical cluster analysis,thereby providing a reference for standardizing neonatal parenteral nutrition supportive therapy.[Methods]The data about neonatal PN formulations prepared by the Pharmacy Intravenous Admixture Services(PIVAS)of the Affiliated Hospital of Chengde Medical University from July 2015 to June 2021 were collected.The general information of the prescriptions and the frequency of drug use were analyzed with Excel 2019;the boxplot of drug dosing was drawn using GraphPad 8.0 software;and SPSS Modeler 18.0 and SPSS Statistics 26.0 were used to perform association rules and hierarchical cluster analysis.[Results]A total of 11488 PN prescriptions were collected from 1421 newborns,involving 18 kinds of drugs,which were divided into 11 types of nutrients.Association rules analysis yielded 84 nutrient substance combinations.The combination of fat emulsion-water-soluble vitamins-fat-soluble vitamins-glucose-amino acids had the highest confidence(99.95%).The hierarchical cluster analysis divided nutrients into 5 types.[Conclusions]The prescriptions of PN for newborns were composed of five types of nutrients:amino acids,fat emulsion,glucose,water-soluble vitamins,and fat-soluble vitamins.According to the lack of electrolytes and trace elements,appropriate drugs can be chosen to meet nutritional demands.This study provides reference basis for reasonable selection of drugs for neonatal PN prescriptions and further standardization of PN supportive therapy in newborns.
文摘Compared with the pair-wise registration of point clouds,multi-view point cloud registration is much less studied.In this dissertation,a disordered multi-view point cloud registration method based on the soft trimmed deep network is proposed.In this method,firstly,the expression ability of feature extraction module is improved and the registration accuracy is increased by enhancing feature extraction network with the point pair feature.Secondly,neighborhood and angle similarities are used to measure the consistency of candidate points to surrounding neighborhoods.By combining distance consistency and high dimensional feature consistency,our network introduces the confidence estimation module of registration,so the point cloud trimmed problem can be converted to candidate for the degree of confidence estimation problem,achieving the pair-wise registration of partially overlapping point clouds.Thirdly,the results from pair-wise registration are fed into the model fusion to achieve the rough registration of multi-view point clouds.Finally,the hierarchical clustering is used to iteratively optimize the clustering center model by gradually increasing the number of clustering categories and performing clustering and registration alternately.This method achieves rough point cloud registration quickly in the early stage,improves the accuracy of multi-view point cloud registration in the later stage,and makes full use of global information to achieve robust and accurate multi-view registration without initial value.