Traditional clustering algorithms often struggle to produce satisfactory results when dealing with datasets withuneven density. Additionally, they incur substantial computational costs when applied to high-dimensional...Traditional clustering algorithms often struggle to produce satisfactory results when dealing with datasets withuneven density. Additionally, they incur substantial computational costs when applied to high-dimensional datadue to calculating similarity matrices. To alleviate these issues, we employ the KD-Tree to partition the dataset andcompute the K-nearest neighbors (KNN) density for each point, thereby avoiding the computation of similaritymatrices. Moreover, we apply the rules of voting elections, treating each data point as a voter and casting a votefor the point with the highest density among its KNN. By utilizing the vote counts of each point, we develop thestrategy for classifying noise points and potential cluster centers, allowing the algorithm to identify clusters withuneven density and complex shapes. Additionally, we define the concept of “adhesive points” between two clustersto merge adjacent clusters that have similar densities. This process helps us identify the optimal number of clustersautomatically. Experimental results indicate that our algorithm not only improves the efficiency of clustering butalso increases its accuracy.展开更多
Finding clusters based on density represents a significant class of clustering algorithms.These methods can discover clusters of various shapes and sizes.The most studied algorithm in this class is theDensity-Based Sp...Finding clusters based on density represents a significant class of clustering algorithms.These methods can discover clusters of various shapes and sizes.The most studied algorithm in this class is theDensity-Based Spatial Clustering of Applications with Noise(DBSCAN).It identifies clusters by grouping the densely connected objects into one group and discarding the noise objects.It requires two input parameters:epsilon(fixed neighborhood radius)and MinPts(the lowest number of objects in epsilon).However,it can’t handle clusters of various densities since it uses a global value for epsilon.This article proposes an adaptation of the DBSCAN method so it can discover clusters of varied densities besides reducing the required number of input parameters to only one.Only user input in the proposed method is the MinPts.Epsilon on the other hand,is computed automatically based on statistical information of the dataset.The proposed method finds the core distance for each object in the dataset,takes the average of these distances as the first value of epsilon,and finds the clusters satisfying this density level.The remaining unclustered objects will be clustered using a new value of epsilon that equals the average core distances of unclustered objects.This process continues until all objects have been clustered or the remaining unclustered objects are less than 0.006 of the dataset’s size.The proposed method requires MinPts only as an input parameter because epsilon is computed from data.Benchmark datasets were used to evaluate the effectiveness of the proposed method that produced promising results.Practical experiments demonstrate that the outstanding ability of the proposed method to detect clusters of different densities even if there is no separation between them.The accuracy of the method ranges from 92%to 100%for the experimented datasets.展开更多
The key challenge of the extended target probability hypothesis density (ET-PHD) filter is to reduce the computational complexity by using a subset to approximate the full set of partitions. In this paper, the influen...The key challenge of the extended target probability hypothesis density (ET-PHD) filter is to reduce the computational complexity by using a subset to approximate the full set of partitions. In this paper, the influence for the tracking results of different partitions is analyzed, and the form of the most informative partition is obtained. Then, a fast density peak-based clustering (FDPC) partitioning algorithm is applied to the measurement set partitioning. Since only one partition of the measurement set is used, the ET-PHD filter based on FDPC partitioning has lower computational complexity than the other ET-PHD filters. As FDPC partitioning is able to remove the spatially close clutter-generated measurements, the ET-PHD filter based on FDPC partitioning has good tracking performance in the scenario with more clutter-generated measurements. The simulation results show that the proposed algorithm can get the most informative partition and obviously reduce computational burden without losing tracking performance. As the number of clutter-generated measurements increased, the ET-PHD filter based on FDPC partitioning has better tracking performance than other ET-PHD filters. The FDPC algorithm will play an important role in the engineering realization of the multiple extended target tracking filter.展开更多
Clustering filtering is usually a practical method for light detection and ranging(LiDAR)point clouds filtering according to their characteristic attributes.However,the amount of point cloud data is extremely large in...Clustering filtering is usually a practical method for light detection and ranging(LiDAR)point clouds filtering according to their characteristic attributes.However,the amount of point cloud data is extremely large in practice,making it impossible to cluster point clouds data directly,and the filtering error is also too large.Moreover,many existing filtering algorithms have poor classification results in discontinuous terrain.This article proposes a new fast classification filtering algorithm based on density clustering,which can solve the problem of point clouds classification in discontinuous terrain.Based on the spatial density of LiDAR point clouds,also the features of the ground object point clouds and the terrain point clouds,the point clouds are clustered firstly by their elevations,and then the plane point clouds are selected.Thus the number of samples and feature dimensions of data are reduced.Using the DBSCAN clustering filtering method,the original point clouds are finally divided into noise point clouds,ground object point clouds,and terrain point clouds.The experiment uses 15 sets of data samples provided by the International Society for Photogrammetry and Remote Sensing(ISPRS),and the results of the proposed algorithm are compared with the other eight classical filtering algorithms.Quantitative and qualitative analysis shows that the proposed algorithm has good applicability in urban areas and rural areas,and is significantly better than other classic filtering algorithms in discontinuous terrain,with a total error of about 10%.The results show that the proposed method is feasible and can be used in different terrains.展开更多
The mesoscale eddy is a typical mesoscale oceanic phenomenon that transfers ocean energy. The detection and extraction of mesoscale eddies is an important aspect of physical oceanography, and automatic mesoscale eddy ...The mesoscale eddy is a typical mesoscale oceanic phenomenon that transfers ocean energy. The detection and extraction of mesoscale eddies is an important aspect of physical oceanography, and automatic mesoscale eddy detection algorithms are the most fundamental tools for detecting and analyzing mesoscale eddies. The main data used in mesoscale eddy detection are sea level anomaly(SLA) data merged by multi-satellite altimeters' data.These data objectively describe the state of the sea surface height. The mesoscale eddy can be represented by a local equivalent region surrounded by an SLA closed contour, and the detection process requires the extraction of a stable closed contour structure from SLA maps. In consideration of the characteristics of mesoscale eddy detection based on SLA data, this paper proposes a new automatic mesoscale eddy detection algorithm based on clustering. The mesoscale eddy structure can be extracted by separating and filtering SLA data sets to separate a mesoscale eddy region and non-eddy region and then establishing relationships among eddy regions and mapping them on SLA maps. This paper overcomes the problem of the sensitivity of parameter setting that affects the traditional detection algorithm and does not require a sensitivity test. The proposed algorithm is thus more adaptable. An eddy discrimination mechanism is added to the algorithm to ensure the stability of the detected eddy structure and to improve the detection accuracy. On this basis, the paper selects the Northwest Pacific Ocean and the South China Sea to carry out a mesoscale eddy detection experiment. Experimental results show that the proposed algorithm is more efficient than the traditional algorithm and the results of the algorithm remain stable. The proposed algorithm detects not only stable single-core eddies but also stable multi-core eddy structures.展开更多
This paper presents an effective clustering mode and a novel clustering result evaluating mode. Clustering mode has two limited integral parameters. Evaluating mode evaluates clustering results and gives each a mark. ...This paper presents an effective clustering mode and a novel clustering result evaluating mode. Clustering mode has two limited integral parameters. Evaluating mode evaluates clustering results and gives each a mark. The higher mark the clustering result gains, the higher quality it has. By organizing two modes in different ways, we can build two clustering algorithms: SECDU(Self-Expanded Clustering Algorithm based on Density Units) and SECDUF(Self-Expanded Clustering Algorithm Based on Density Units with Evaluation Feedback Section). SECDU enumerates all value pairs of two parameters of clustering mode to process data set repeatedly and evaluates every clustering result by evaluating mode. Then SECDU output the clustering result that has the highest evaluating mark among all the ones. By applying "hill-climbing algorithm", SECDUF improves clustering efficiency greatly. Data sets that have different distribution features can be well adapted to both algorithms. SECDU and SECDUF can output high-quality clustering results. SECDUF tunes parameters of clustering mode automatically and no man's action involves through the whole process. In addition, SECDUF has a high clustering performance.展开更多
Performing cluster analysis on molecular conformation is an important way to find the representative conformation in the molecular dynamics trajectories.Usually,it is a critical step for interpreting complex conformat...Performing cluster analysis on molecular conformation is an important way to find the representative conformation in the molecular dynamics trajectories.Usually,it is a critical step for interpreting complex conformational changes or interaction mechanisms.As one of the density-based clustering algorithms,find density peaks(FDP)is an accurate and reasonable candidate for the molecular conformation clustering.However,facing the rapidly increasing simulation length due to the increase in computing power,the low computing efficiency of FDP limits its application potential.Here we propose a marginal extension to FDP named K-means find density peaks(KFDP)to solve the mass source consuming problem.In KFDP,the points are initially clustered by a high efficiency clustering algorithm,such as K-means.Cluster centers are defined as typical points with a weight which represents the cluster size.Then,the weighted typical points are clustered again by FDP,and then are refined as core,boundary,and redefined halo points.In this way,KFDP has comparable accuracy as FDP but its computational complexity is reduced from O(n^(2))to O(n).We apply and test our KFDP method to the trajectory data of multiple small proteins in terms of torsion angle,secondary structure or contact map.The comparing results with K-means and density-based spatial clustering of applications with noise show the validation of the proposed KFDP.展开更多
This paper introduces niching particle swarm optimiza- tion (nichePSO) into clustering analysis and puts forward a cluster- ing algorithm which uses nichePSO to optimize density functions. Firstly, this paper improv...This paper introduces niching particle swarm optimiza- tion (nichePSO) into clustering analysis and puts forward a cluster- ing algorithm which uses nichePSO to optimize density functions. Firstly, this paper improves main swarm training models and in- creases their ability of space searching. Secondly, the radius of sub-swarms is defined adaptively according to the actual clus- tering problem, which can be useful for the niches' forming and searching. At last, a novel method that distributes samples to the corresponding cluster is proposed. Numerical results illustrate that this algorithm based on the density function and nichePSO could cluster unbalanced density datasets into the correct clusters auto- matically and accurately.展开更多
We present a novel unsupervised integrated score framework to generate generic extractive multi- document summaries by ranking sentences based on dynamic programming (DP) strategy. Considering that cluster-based met...We present a novel unsupervised integrated score framework to generate generic extractive multi- document summaries by ranking sentences based on dynamic programming (DP) strategy. Considering that cluster-based methods proposed by other researchers tend to ignore informativeness of words when they generate summaries, our proposed framework takes relevance, diversity, informativeness and length constraint of sentences into consideration comprehensively. We apply Density Peaks Clustering (DPC) to get relevance scores and diversity scores of sentences simultaneously. Our framework produces the best performance on DUC2004, 0.396 of ROUGE-1 score, 0.094 of ROUGE-2 score and 0.143 of ROUGE-SU4 which outperforms a series of popular baselines, such as DUC Best, FGB [7], and BSTM [10].展开更多
Outlier detection is an important task in data mining. In fact, it is difficult to find the clustering centers in some sophisticated multidimensional datasets and to measure the deviation degree of each potential outl...Outlier detection is an important task in data mining. In fact, it is difficult to find the clustering centers in some sophisticated multidimensional datasets and to measure the deviation degree of each potential outlier. In this work, an effective outlier detection method based on multi-dimensional clustering and local density(ODBMCLD) is proposed. ODBMCLD firstly identifies the center objects by the local density peak of data objects, and clusters the whole dataset based on the center objects. Then, outlier objects belonging to different clusters will be marked as candidates of abnormal data. Finally, the top N points among these abnormal candidates are chosen as final anomaly objects with high outlier factors. The feasibility and effectiveness of the method are verified by experiments.展开更多
The internal structures as well as adsorption and hopping energies of monomers, dimers, trimers, tetramers, pentamers and hexamers of water on Pd(111) have been studied by density functional theory (DFT) plane-wav...The internal structures as well as adsorption and hopping energies of monomers, dimers, trimers, tetramers, pentamers and hexamers of water on Pd(111) have been studied by density functional theory (DFT) plane-wave pseudopotential method which performs the firstprinciples quantum-mechanical calculations to explore the properties of crystals and surfaces in materials. Based on the calculations, we suppose that their absorption is via one water molecule for monomers, dimmers and trimers, but three water molecules for pentamers and hexamers. Moreover, there is one water molecule bonding with Pd atom by O atom in pentamers and hexamers, which explains why pentamers and hexamers are stable. The binding energies of polymers may be used to explain why the trimer comes close to two nearby monomers to form a stable pentamer instead of tetramer. And the difference of mobility of small water clusters is due to their different hopping energies.展开更多
As to the fact that it is difficult to obtain analytical form of optimal sampling density and tracking performance of standard particle probability hypothesis density(P-PHD) filter would decline when clustering algori...As to the fact that it is difficult to obtain analytical form of optimal sampling density and tracking performance of standard particle probability hypothesis density(P-PHD) filter would decline when clustering algorithm is used to extract target states,a free clustering optimal P-PHD(FCO-P-PHD) filter is proposed.This method can lead to obtainment of analytical form of optimal sampling density of P-PHD filter and realization of optimal P-PHD filter without use of clustering algorithms in extraction target states.Besides,as sate extraction method in FCO-P-PHD filter is coupled with the process of obtaining analytical form for optimal sampling density,through decoupling process,a new single-sensor free clustering state extraction method is proposed.By combining this method with standard P-PHD filter,FC-P-PHD filter can be obtained,which significantly improves the tracking performance of P-PHD filter.In the end,the effectiveness of proposed algorithms and their advantages over other algorithms are validated through several simulation experiments.展开更多
Encephalitis is a brain inflammation disease.Encephalitis can yield to seizures,motor disability,or some loss of vision or hearing.Sometimes,encepha-litis can be a life-threatening and proper diagnosis in an early stag...Encephalitis is a brain inflammation disease.Encephalitis can yield to seizures,motor disability,or some loss of vision or hearing.Sometimes,encepha-litis can be a life-threatening and proper diagnosis in an early stage is very crucial.Therefore,in this paper,we are proposing a deep learning model for computerized detection of Encephalitis from the electroencephalogram data(EEG).Also,we propose a Density-Based Clustering model to classify the distinctive waves of Encephalitis.Customary clustering models usually employ a computed single centroid virtual point to define the cluster configuration,but this single point does not contain adequate information.To precisely extract accurate inner structural data,a multiple centroids approach is employed and defined in this paper,which defines the cluster configuration by allocating weights to each state in the cluster.The multiple EEG view fuzzy learning approach incorporates data from every sin-gle view to enhance the model's clustering performance.Also a fuzzy Density-Based Clustering model with multiple centroids(FDBC)is presented.This model employs multiple real state centroids to define clusters using Partitioning Around Centroids algorithm.The Experimental results validate the medical importance of the proposed clustering model.展开更多
Finding clusters in data is a challenging problem especially when the clusters are being of widely varied shapes, sizes, and densities. Herein a new scalable clustering technique which addresses all these issues is pr...Finding clusters in data is a challenging problem especially when the clusters are being of widely varied shapes, sizes, and densities. Herein a new scalable clustering technique which addresses all these issues is proposed. In data mining, the purpose of data clustering is to identify useful patterns in the underlying dataset. Within the last several years, many clustering algorithms have been proposed in this area of research. Among all these proposed methods, density clustering methods are the most important due to their high ability to detect arbitrary shaped clusters. Moreover these methods often show good noise-handling capabilities, where clusters are defined as regions of typical densities separated by low or no density regions. In this paper, we aim at enhancing the well-known algorithm DBSCAN, to make it scalable and able to discover clusters from uneven datasets in which clusters are regions of homogenous densities. We achieved the scalability of the proposed algorithm by using the k-means algorithm to get initial partition of the dataset, applying the enhanced DBSCAN on each partition, and then using a merging process to get the actual natural number of clusters in the underlying dataset. This means the proposed algorithm consists of three stages. Experimental results using synthetic datasets show that the proposed clustering algorithm is faster and more scalable than the enhanced DBSCAN counterpart.展开更多
In wireless sensor networks (WSNs), it is essential to save energy consumption at sensor nodes (SNs). A clustering technique is one of the approaches to save energy consumption, where several neighboring SNs form a cl...In wireless sensor networks (WSNs), it is essential to save energy consumption at sensor nodes (SNs). A clustering technique is one of the approaches to save energy consumption, where several neighboring SNs form a cluster and transmit the sensed data to their cluster head (CH), and then the CH sends the aggregated data to a sink node. Under spatial non-uniform traffic environments, the clustering technique causes the non-uniformity in data gathering performance and energy consumption between clusters in WSNs. In this paper, we propose a clustering scheme for the WSNs employing IEEE802.15.4 beacon enabled mode under various non-uniform traffic environments. The proposed scheme distributes network traffic uniformly to the clusters through cluster area control by adjusting beacon transmission power, and thereby achieves uniform and improved data gathering performance. In the clusters with expanded area, however, the performance degradation arises from long distance communications. To solve this problem, the proposed scheme controls transmission power at SNs. In addition, to reduce energy consumption the proposed scheme sets the appropriate active period length in duty cycle operation to the current traffic condition. The performance evaluations by computer simulation show the effectiveness of the proposed scheme for the WSNs under various non-uniform traffic environments.展开更多
An improved clustering algorithm was presented based on density-isoline clustering algorithm. The new algorithm can do a better job than density-isoline clustering when dealing with noise, not having to literately cal...An improved clustering algorithm was presented based on density-isoline clustering algorithm. The new algorithm can do a better job than density-isoline clustering when dealing with noise, not having to literately calculate the cluster centers for the samples batching into clusters instead of one by one. After repeated experiments, the results demonstrate that the improved density-isoline clustering algorithm is significantly more efficiency in clustering with noises and overcomes the drawbacks that traditional algorithm DILC deals with noise and that the efficiency of running time is improved greatly.展开更多
Clustering evolving data streams is important to be performed in a limited time with a reasonable quality. The existing micro clustering based methods do not consider the distribution of data points inside the micro c...Clustering evolving data streams is important to be performed in a limited time with a reasonable quality. The existing micro clustering based methods do not consider the distribution of data points inside the micro cluster. We propose LeaDen-Stream (Leader Density-based clustering algorithm over evolving data Stream), a density-based clustering algorithm using leader clustering. The algorithm is based on a two-phase clustering. The online phase selects the proper mini-micro or micro-cluster leaders based on the distribution of data points in the micro clusters. Then, the leader centers are sent to the offline phase to form final clusters. In LeaDen-Stream, by carefully choosing between two kinds of micro leaders, we decrease time complexity of the clustering while maintaining the cluster quality. A pruning strategy is also used to filter out real data from noise by introducing dense and sparse mini-micro and micro-cluster leaders. Our performance study over a number of real and synthetic data sets demonstrates the effectiveness and efficiency of our method.展开更多
With the rapid advance of wireless communication, tracking the positions of the moving objects is becoming increasingly feasible and necessary. Because a large number of people use mobile phones, we must handle a larg...With the rapid advance of wireless communication, tracking the positions of the moving objects is becoming increasingly feasible and necessary. Because a large number of people use mobile phones, we must handle a large moving object database as well as the following problems. How can we provide the customers with high quality service, that means, how can we deal with so many enquiries within as less time as possible? Because of the large number of data, the gap between CPU speed and the size of main memory has increasing considerably. One way to reduce the time to handle enquiries is to reduce the I/O number between the buffer and the secondary storage.An effective clustering of the objects can minimize the I/O cost between them. In this paper, according to the characteristic of the moving object database, we analyze the objects in buffer, according to their mappings in the two dimension coordinate, and then develop a density based clustering method to effectively reorganize the clusters. This new mechanism leads to the less cost of the I/O operation and the more efficient response to enquiries.展开更多
We study the geometries, stabilities, electronic and magnetic properties of (MgO)n (n=2-10) clusters doped with a single Mn atom using the density functional theory with the gener- alized gradient approximation. T...We study the geometries, stabilities, electronic and magnetic properties of (MgO)n (n=2-10) clusters doped with a single Mn atom using the density functional theory with the gener- alized gradient approximation. The optimized geometries show that the impurity Mn atom prefers to replace the Mg atom which has low coordination number in all the lowest-energy MnMgn-1On (n=2-10) structures. The stability analysis clearly represents that the average binding energies of the doped clusters are larger than those of the corresponding pure (MgO)n clusters. Maximum peaks of the second order energy differences are observed for MnMg~_1On clusters at n=6, 9, implying that these clusters exhibit higher stability than their neighboring clusters. In addition, all the Mn-doped Mg clusters exhibit high total magnetic moments with the exception of MnMgO2 which has 3.00μB. Their magnetic behavior is attributed to the impurity Mn atom, the charge transfer modes, and the size of MnMgn- 1On clusters.展开更多
The growth pattern and electronic properties of TiGen^- (n=7-12) clusters were investigated using anion photoelectron spectroscopy and density functional theory calculations. For both anionic and neutral TiGen clust...The growth pattern and electronic properties of TiGen^- (n=7-12) clusters were investigated using anion photoelectron spectroscopy and density functional theory calculations. For both anionic and neutral TiGen clusters, a half-encapsulated boat-shaped structure appears at n=8, and the boat-shaped structure is gradually covered by the additional Ge atoms to form Gen cage at n=9-11. TiGe12^- cluster has a distorted hexagonal prism cage structure. According to the natural population analysis, the electron transfers from the Gen framework to the Ti atom for TiGen^-/0 clusters at n=8-12, implying that the electron transfer pattern is related to the structural evolution.展开更多
基金National Natural Science Foundation of China Nos.61962054 and 62372353.
文摘Traditional clustering algorithms often struggle to produce satisfactory results when dealing with datasets withuneven density. Additionally, they incur substantial computational costs when applied to high-dimensional datadue to calculating similarity matrices. To alleviate these issues, we employ the KD-Tree to partition the dataset andcompute the K-nearest neighbors (KNN) density for each point, thereby avoiding the computation of similaritymatrices. Moreover, we apply the rules of voting elections, treating each data point as a voter and casting a votefor the point with the highest density among its KNN. By utilizing the vote counts of each point, we develop thestrategy for classifying noise points and potential cluster centers, allowing the algorithm to identify clusters withuneven density and complex shapes. Additionally, we define the concept of “adhesive points” between two clustersto merge adjacent clusters that have similar densities. This process helps us identify the optimal number of clustersautomatically. Experimental results indicate that our algorithm not only improves the efficiency of clustering butalso increases its accuracy.
基金The author extends his appreciation to theDeputyship forResearch&Innovation,Ministry of Education in Saudi Arabia for funding this research work through the project number(IFPSAU-2021/01/17758).
文摘Finding clusters based on density represents a significant class of clustering algorithms.These methods can discover clusters of various shapes and sizes.The most studied algorithm in this class is theDensity-Based Spatial Clustering of Applications with Noise(DBSCAN).It identifies clusters by grouping the densely connected objects into one group and discarding the noise objects.It requires two input parameters:epsilon(fixed neighborhood radius)and MinPts(the lowest number of objects in epsilon).However,it can’t handle clusters of various densities since it uses a global value for epsilon.This article proposes an adaptation of the DBSCAN method so it can discover clusters of varied densities besides reducing the required number of input parameters to only one.Only user input in the proposed method is the MinPts.Epsilon on the other hand,is computed automatically based on statistical information of the dataset.The proposed method finds the core distance for each object in the dataset,takes the average of these distances as the first value of epsilon,and finds the clusters satisfying this density level.The remaining unclustered objects will be clustered using a new value of epsilon that equals the average core distances of unclustered objects.This process continues until all objects have been clustered or the remaining unclustered objects are less than 0.006 of the dataset’s size.The proposed method requires MinPts only as an input parameter because epsilon is computed from data.Benchmark datasets were used to evaluate the effectiveness of the proposed method that produced promising results.Practical experiments demonstrate that the outstanding ability of the proposed method to detect clusters of different densities even if there is no separation between them.The accuracy of the method ranges from 92%to 100%for the experimented datasets.
基金supported by the National Natural Science Foundation of China(61401475)
文摘The key challenge of the extended target probability hypothesis density (ET-PHD) filter is to reduce the computational complexity by using a subset to approximate the full set of partitions. In this paper, the influence for the tracking results of different partitions is analyzed, and the form of the most informative partition is obtained. Then, a fast density peak-based clustering (FDPC) partitioning algorithm is applied to the measurement set partitioning. Since only one partition of the measurement set is used, the ET-PHD filter based on FDPC partitioning has lower computational complexity than the other ET-PHD filters. As FDPC partitioning is able to remove the spatially close clutter-generated measurements, the ET-PHD filter based on FDPC partitioning has good tracking performance in the scenario with more clutter-generated measurements. The simulation results show that the proposed algorithm can get the most informative partition and obviously reduce computational burden without losing tracking performance. As the number of clutter-generated measurements increased, the ET-PHD filter based on FDPC partitioning has better tracking performance than other ET-PHD filters. The FDPC algorithm will play an important role in the engineering realization of the multiple extended target tracking filter.
基金The Natural Science Foundation of Hunan Province,China(No.2020JJ4601)Open Fund of the Key Laboratory of Highway Engi-neering of Ministry of Education(No.kfj190203).
文摘Clustering filtering is usually a practical method for light detection and ranging(LiDAR)point clouds filtering according to their characteristic attributes.However,the amount of point cloud data is extremely large in practice,making it impossible to cluster point clouds data directly,and the filtering error is also too large.Moreover,many existing filtering algorithms have poor classification results in discontinuous terrain.This article proposes a new fast classification filtering algorithm based on density clustering,which can solve the problem of point clouds classification in discontinuous terrain.Based on the spatial density of LiDAR point clouds,also the features of the ground object point clouds and the terrain point clouds,the point clouds are clustered firstly by their elevations,and then the plane point clouds are selected.Thus the number of samples and feature dimensions of data are reduced.Using the DBSCAN clustering filtering method,the original point clouds are finally divided into noise point clouds,ground object point clouds,and terrain point clouds.The experiment uses 15 sets of data samples provided by the International Society for Photogrammetry and Remote Sensing(ISPRS),and the results of the proposed algorithm are compared with the other eight classical filtering algorithms.Quantitative and qualitative analysis shows that the proposed algorithm has good applicability in urban areas and rural areas,and is significantly better than other classic filtering algorithms in discontinuous terrain,with a total error of about 10%.The results show that the proposed method is feasible and can be used in different terrains.
基金The National Key R&D Program of China under contract No.2016YFC1401800the National Natural Science Foundation of China under contract No.41576176the National Programme on Global Change and Air-Sea Interaction under contract Nos GASI-02-PAC-YGST2-04,GASI-02-IND-YGST2-04 and GASI-02-SCS-YGST2-04
文摘The mesoscale eddy is a typical mesoscale oceanic phenomenon that transfers ocean energy. The detection and extraction of mesoscale eddies is an important aspect of physical oceanography, and automatic mesoscale eddy detection algorithms are the most fundamental tools for detecting and analyzing mesoscale eddies. The main data used in mesoscale eddy detection are sea level anomaly(SLA) data merged by multi-satellite altimeters' data.These data objectively describe the state of the sea surface height. The mesoscale eddy can be represented by a local equivalent region surrounded by an SLA closed contour, and the detection process requires the extraction of a stable closed contour structure from SLA maps. In consideration of the characteristics of mesoscale eddy detection based on SLA data, this paper proposes a new automatic mesoscale eddy detection algorithm based on clustering. The mesoscale eddy structure can be extracted by separating and filtering SLA data sets to separate a mesoscale eddy region and non-eddy region and then establishing relationships among eddy regions and mapping them on SLA maps. This paper overcomes the problem of the sensitivity of parameter setting that affects the traditional detection algorithm and does not require a sensitivity test. The proposed algorithm is thus more adaptable. An eddy discrimination mechanism is added to the algorithm to ensure the stability of the detected eddy structure and to improve the detection accuracy. On this basis, the paper selects the Northwest Pacific Ocean and the South China Sea to carry out a mesoscale eddy detection experiment. Experimental results show that the proposed algorithm is more efficient than the traditional algorithm and the results of the algorithm remain stable. The proposed algorithm detects not only stable single-core eddies but also stable multi-core eddy structures.
基金Supported by the National Natural Science Foundation of China(60573089)
文摘This paper presents an effective clustering mode and a novel clustering result evaluating mode. Clustering mode has two limited integral parameters. Evaluating mode evaluates clustering results and gives each a mark. The higher mark the clustering result gains, the higher quality it has. By organizing two modes in different ways, we can build two clustering algorithms: SECDU(Self-Expanded Clustering Algorithm based on Density Units) and SECDUF(Self-Expanded Clustering Algorithm Based on Density Units with Evaluation Feedback Section). SECDU enumerates all value pairs of two parameters of clustering mode to process data set repeatedly and evaluates every clustering result by evaluating mode. Then SECDU output the clustering result that has the highest evaluating mark among all the ones. By applying "hill-climbing algorithm", SECDUF improves clustering efficiency greatly. Data sets that have different distribution features can be well adapted to both algorithms. SECDU and SECDUF can output high-quality clustering results. SECDUF tunes parameters of clustering mode automatically and no man's action involves through the whole process. In addition, SECDUF has a high clustering performance.
基金Professor Hong Yu at Intelligent Fishery Innovative Team(No.C202109)in School of Information Engineering of Dalian Ocean University for her support of this workfunded by the National Natural Science Foundation of China(No.31800615 and No.21933010)。
文摘Performing cluster analysis on molecular conformation is an important way to find the representative conformation in the molecular dynamics trajectories.Usually,it is a critical step for interpreting complex conformational changes or interaction mechanisms.As one of the density-based clustering algorithms,find density peaks(FDP)is an accurate and reasonable candidate for the molecular conformation clustering.However,facing the rapidly increasing simulation length due to the increase in computing power,the low computing efficiency of FDP limits its application potential.Here we propose a marginal extension to FDP named K-means find density peaks(KFDP)to solve the mass source consuming problem.In KFDP,the points are initially clustered by a high efficiency clustering algorithm,such as K-means.Cluster centers are defined as typical points with a weight which represents the cluster size.Then,the weighted typical points are clustered again by FDP,and then are refined as core,boundary,and redefined halo points.In this way,KFDP has comparable accuracy as FDP but its computational complexity is reduced from O(n^(2))to O(n).We apply and test our KFDP method to the trajectory data of multiple small proteins in terms of torsion angle,secondary structure or contact map.The comparing results with K-means and density-based spatial clustering of applications with noise show the validation of the proposed KFDP.
基金supported by the National Natural Science Foundation of China (708710157103100271171030)
文摘This paper introduces niching particle swarm optimiza- tion (nichePSO) into clustering analysis and puts forward a cluster- ing algorithm which uses nichePSO to optimize density functions. Firstly, this paper improves main swarm training models and in- creases their ability of space searching. Secondly, the radius of sub-swarms is defined adaptively according to the actual clus- tering problem, which can be useful for the niches' forming and searching. At last, a novel method that distributes samples to the corresponding cluster is proposed. Numerical results illustrate that this algorithm based on the density function and nichePSO could cluster unbalanced density datasets into the correct clusters auto- matically and accurately.
文摘We present a novel unsupervised integrated score framework to generate generic extractive multi- document summaries by ranking sentences based on dynamic programming (DP) strategy. Considering that cluster-based methods proposed by other researchers tend to ignore informativeness of words when they generate summaries, our proposed framework takes relevance, diversity, informativeness and length constraint of sentences into consideration comprehensively. We apply Density Peaks Clustering (DPC) to get relevance scores and diversity scores of sentences simultaneously. Our framework produces the best performance on DUC2004, 0.396 of ROUGE-1 score, 0.094 of ROUGE-2 score and 0.143 of ROUGE-SU4 which outperforms a series of popular baselines, such as DUC Best, FGB [7], and BSTM [10].
基金Project(61362021)supported by the National Natural Science Foundation of ChinaProject(2016GXNSFAA380149)supported by Natural Science Foundation of Guangxi Province,China+1 种基金Projects(2016YJCXB02,2017YJCX34)supported by Innovation Project of GUET Graduate Education,ChinaProject(2011KF11)supported by the Key Laboratory of Cognitive Radio and Information Processing,Ministry of Education,China
文摘Outlier detection is an important task in data mining. In fact, it is difficult to find the clustering centers in some sophisticated multidimensional datasets and to measure the deviation degree of each potential outlier. In this work, an effective outlier detection method based on multi-dimensional clustering and local density(ODBMCLD) is proposed. ODBMCLD firstly identifies the center objects by the local density peak of data objects, and clusters the whole dataset based on the center objects. Then, outlier objects belonging to different clusters will be marked as candidates of abnormal data. Finally, the top N points among these abnormal candidates are chosen as final anomaly objects with high outlier factors. The feasibility and effectiveness of the method are verified by experiments.
基金Supported by the Natural Science Foundation of Yunnan Province (No. 2004B0003M)
文摘The internal structures as well as adsorption and hopping energies of monomers, dimers, trimers, tetramers, pentamers and hexamers of water on Pd(111) have been studied by density functional theory (DFT) plane-wave pseudopotential method which performs the firstprinciples quantum-mechanical calculations to explore the properties of crystals and surfaces in materials. Based on the calculations, we suppose that their absorption is via one water molecule for monomers, dimmers and trimers, but three water molecules for pentamers and hexamers. Moreover, there is one water molecule bonding with Pd atom by O atom in pentamers and hexamers, which explains why pentamers and hexamers are stable. The binding energies of polymers may be used to explain why the trimer comes close to two nearby monomers to form a stable pentamer instead of tetramer. And the difference of mobility of small water clusters is due to their different hopping energies.
文摘As to the fact that it is difficult to obtain analytical form of optimal sampling density and tracking performance of standard particle probability hypothesis density(P-PHD) filter would decline when clustering algorithm is used to extract target states,a free clustering optimal P-PHD(FCO-P-PHD) filter is proposed.This method can lead to obtainment of analytical form of optimal sampling density of P-PHD filter and realization of optimal P-PHD filter without use of clustering algorithms in extraction target states.Besides,as sate extraction method in FCO-P-PHD filter is coupled with the process of obtaining analytical form for optimal sampling density,through decoupling process,a new single-sensor free clustering state extraction method is proposed.By combining this method with standard P-PHD filter,FC-P-PHD filter can be obtained,which significantly improves the tracking performance of P-PHD filter.In the end,the effectiveness of proposed algorithms and their advantages over other algorithms are validated through several simulation experiments.
基金funded by Princess Nourah bint Abdulrahman University Researchers Supporting Project number(PNURSP2022R113)Princess Nourah bint Abdulrahman University,Riyadh,Saudi Arabia.
文摘Encephalitis is a brain inflammation disease.Encephalitis can yield to seizures,motor disability,or some loss of vision or hearing.Sometimes,encepha-litis can be a life-threatening and proper diagnosis in an early stage is very crucial.Therefore,in this paper,we are proposing a deep learning model for computerized detection of Encephalitis from the electroencephalogram data(EEG).Also,we propose a Density-Based Clustering model to classify the distinctive waves of Encephalitis.Customary clustering models usually employ a computed single centroid virtual point to define the cluster configuration,but this single point does not contain adequate information.To precisely extract accurate inner structural data,a multiple centroids approach is employed and defined in this paper,which defines the cluster configuration by allocating weights to each state in the cluster.The multiple EEG view fuzzy learning approach incorporates data from every sin-gle view to enhance the model's clustering performance.Also a fuzzy Density-Based Clustering model with multiple centroids(FDBC)is presented.This model employs multiple real state centroids to define clusters using Partitioning Around Centroids algorithm.The Experimental results validate the medical importance of the proposed clustering model.
文摘Finding clusters in data is a challenging problem especially when the clusters are being of widely varied shapes, sizes, and densities. Herein a new scalable clustering technique which addresses all these issues is proposed. In data mining, the purpose of data clustering is to identify useful patterns in the underlying dataset. Within the last several years, many clustering algorithms have been proposed in this area of research. Among all these proposed methods, density clustering methods are the most important due to their high ability to detect arbitrary shaped clusters. Moreover these methods often show good noise-handling capabilities, where clusters are defined as regions of typical densities separated by low or no density regions. In this paper, we aim at enhancing the well-known algorithm DBSCAN, to make it scalable and able to discover clusters from uneven datasets in which clusters are regions of homogenous densities. We achieved the scalability of the proposed algorithm by using the k-means algorithm to get initial partition of the dataset, applying the enhanced DBSCAN on each partition, and then using a merging process to get the actual natural number of clusters in the underlying dataset. This means the proposed algorithm consists of three stages. Experimental results using synthetic datasets show that the proposed clustering algorithm is faster and more scalable than the enhanced DBSCAN counterpart.
文摘In wireless sensor networks (WSNs), it is essential to save energy consumption at sensor nodes (SNs). A clustering technique is one of the approaches to save energy consumption, where several neighboring SNs form a cluster and transmit the sensed data to their cluster head (CH), and then the CH sends the aggregated data to a sink node. Under spatial non-uniform traffic environments, the clustering technique causes the non-uniformity in data gathering performance and energy consumption between clusters in WSNs. In this paper, we propose a clustering scheme for the WSNs employing IEEE802.15.4 beacon enabled mode under various non-uniform traffic environments. The proposed scheme distributes network traffic uniformly to the clusters through cluster area control by adjusting beacon transmission power, and thereby achieves uniform and improved data gathering performance. In the clusters with expanded area, however, the performance degradation arises from long distance communications. To solve this problem, the proposed scheme controls transmission power at SNs. In addition, to reduce energy consumption the proposed scheme sets the appropriate active period length in duty cycle operation to the current traffic condition. The performance evaluations by computer simulation show the effectiveness of the proposed scheme for the WSNs under various non-uniform traffic environments.
文摘An improved clustering algorithm was presented based on density-isoline clustering algorithm. The new algorithm can do a better job than density-isoline clustering when dealing with noise, not having to literately calculate the cluster centers for the samples batching into clusters instead of one by one. After repeated experiments, the results demonstrate that the improved density-isoline clustering algorithm is significantly more efficiency in clustering with noises and overcomes the drawbacks that traditional algorithm DILC deals with noise and that the efficiency of running time is improved greatly.
文摘Clustering evolving data streams is important to be performed in a limited time with a reasonable quality. The existing micro clustering based methods do not consider the distribution of data points inside the micro cluster. We propose LeaDen-Stream (Leader Density-based clustering algorithm over evolving data Stream), a density-based clustering algorithm using leader clustering. The algorithm is based on a two-phase clustering. The online phase selects the proper mini-micro or micro-cluster leaders based on the distribution of data points in the micro clusters. Then, the leader centers are sent to the offline phase to form final clusters. In LeaDen-Stream, by carefully choosing between two kinds of micro leaders, we decrease time complexity of the clustering while maintaining the cluster quality. A pruning strategy is also used to filter out real data from noise by introducing dense and sparse mini-micro and micro-cluster leaders. Our performance study over a number of real and synthetic data sets demonstrates the effectiveness and efficiency of our method.
基金This work is supported by University IT Research Center Project in KOREA.
文摘With the rapid advance of wireless communication, tracking the positions of the moving objects is becoming increasingly feasible and necessary. Because a large number of people use mobile phones, we must handle a large moving object database as well as the following problems. How can we provide the customers with high quality service, that means, how can we deal with so many enquiries within as less time as possible? Because of the large number of data, the gap between CPU speed and the size of main memory has increasing considerably. One way to reduce the time to handle enquiries is to reduce the I/O number between the buffer and the secondary storage.An effective clustering of the objects can minimize the I/O cost between them. In this paper, according to the characteristic of the moving object database, we analyze the objects in buffer, according to their mappings in the two dimension coordinate, and then develop a density based clustering method to effectively reorganize the clusters. This new mechanism leads to the less cost of the I/O operation and the more efficient response to enquiries.
文摘We study the geometries, stabilities, electronic and magnetic properties of (MgO)n (n=2-10) clusters doped with a single Mn atom using the density functional theory with the gener- alized gradient approximation. The optimized geometries show that the impurity Mn atom prefers to replace the Mg atom which has low coordination number in all the lowest-energy MnMgn-1On (n=2-10) structures. The stability analysis clearly represents that the average binding energies of the doped clusters are larger than those of the corresponding pure (MgO)n clusters. Maximum peaks of the second order energy differences are observed for MnMg~_1On clusters at n=6, 9, implying that these clusters exhibit higher stability than their neighboring clusters. In addition, all the Mn-doped Mg clusters exhibit high total magnetic moments with the exception of MnMgO2 which has 3.00μB. Their magnetic behavior is attributed to the impurity Mn atom, the charge transfer modes, and the size of MnMgn- 1On clusters.
基金Wei-jun Zheng acknowledges the Knowledge Innovation Program of the Chinese Academy of Sciences (No.KJCX2-EW-H01) and Hong-guang Xu acknowl- edges the National Natural Science Foundation of China (No.21103202) for financial support. The theoretical calculations were conducted on the ScGrid and Deep- Comp 7000 of the Supercomputing Center, Computer Network Information Center of the Chinese Academy of Sciences.
文摘The growth pattern and electronic properties of TiGen^- (n=7-12) clusters were investigated using anion photoelectron spectroscopy and density functional theory calculations. For both anionic and neutral TiGen clusters, a half-encapsulated boat-shaped structure appears at n=8, and the boat-shaped structure is gradually covered by the additional Ge atoms to form Gen cage at n=9-11. TiGe12^- cluster has a distorted hexagonal prism cage structure. According to the natural population analysis, the electron transfers from the Gen framework to the Ti atom for TiGen^-/0 clusters at n=8-12, implying that the electron transfer pattern is related to the structural evolution.