Traditional clustering algorithms often struggle to produce satisfactory results when dealing with datasets withuneven density. Additionally, they incur substantial computational costs when applied to high-dimensional...Traditional clustering algorithms often struggle to produce satisfactory results when dealing with datasets withuneven density. Additionally, they incur substantial computational costs when applied to high-dimensional datadue to calculating similarity matrices. To alleviate these issues, we employ the KD-Tree to partition the dataset andcompute the K-nearest neighbors (KNN) density for each point, thereby avoiding the computation of similaritymatrices. Moreover, we apply the rules of voting elections, treating each data point as a voter and casting a votefor the point with the highest density among its KNN. By utilizing the vote counts of each point, we develop thestrategy for classifying noise points and potential cluster centers, allowing the algorithm to identify clusters withuneven density and complex shapes. Additionally, we define the concept of “adhesive points” between two clustersto merge adjacent clusters that have similar densities. This process helps us identify the optimal number of clustersautomatically. Experimental results indicate that our algorithm not only improves the efficiency of clustering butalso increases its accuracy.展开更多
Performing cluster analysis on molecular conformation is an important way to find the representative conformation in the molecular dynamics trajectories.Usually,it is a critical step for interpreting complex conformat...Performing cluster analysis on molecular conformation is an important way to find the representative conformation in the molecular dynamics trajectories.Usually,it is a critical step for interpreting complex conformational changes or interaction mechanisms.As one of the density-based clustering algorithms,find density peaks(FDP)is an accurate and reasonable candidate for the molecular conformation clustering.However,facing the rapidly increasing simulation length due to the increase in computing power,the low computing efficiency of FDP limits its application potential.Here we propose a marginal extension to FDP named K-means find density peaks(KFDP)to solve the mass source consuming problem.In KFDP,the points are initially clustered by a high efficiency clustering algorithm,such as K-means.Cluster centers are defined as typical points with a weight which represents the cluster size.Then,the weighted typical points are clustered again by FDP,and then are refined as core,boundary,and redefined halo points.In this way,KFDP has comparable accuracy as FDP but its computational complexity is reduced from O(n^(2))to O(n).We apply and test our KFDP method to the trajectory data of multiple small proteins in terms of torsion angle,secondary structure or contact map.The comparing results with K-means and density-based spatial clustering of applications with noise show the validation of the proposed KFDP.展开更多
The key challenge of the extended target probability hypothesis density (ET-PHD) filter is to reduce the computational complexity by using a subset to approximate the full set of partitions. In this paper, the influen...The key challenge of the extended target probability hypothesis density (ET-PHD) filter is to reduce the computational complexity by using a subset to approximate the full set of partitions. In this paper, the influence for the tracking results of different partitions is analyzed, and the form of the most informative partition is obtained. Then, a fast density peak-based clustering (FDPC) partitioning algorithm is applied to the measurement set partitioning. Since only one partition of the measurement set is used, the ET-PHD filter based on FDPC partitioning has lower computational complexity than the other ET-PHD filters. As FDPC partitioning is able to remove the spatially close clutter-generated measurements, the ET-PHD filter based on FDPC partitioning has good tracking performance in the scenario with more clutter-generated measurements. The simulation results show that the proposed algorithm can get the most informative partition and obviously reduce computational burden without losing tracking performance. As the number of clutter-generated measurements increased, the ET-PHD filter based on FDPC partitioning has better tracking performance than other ET-PHD filters. The FDPC algorithm will play an important role in the engineering realization of the multiple extended target tracking filter.展开更多
We present a novel unsupervised integrated score framework to generate generic extractive multi- document summaries by ranking sentences based on dynamic programming (DP) strategy. Considering that cluster-based met...We present a novel unsupervised integrated score framework to generate generic extractive multi- document summaries by ranking sentences based on dynamic programming (DP) strategy. Considering that cluster-based methods proposed by other researchers tend to ignore informativeness of words when they generate summaries, our proposed framework takes relevance, diversity, informativeness and length constraint of sentences into consideration comprehensively. We apply Density Peaks Clustering (DPC) to get relevance scores and diversity scores of sentences simultaneously. Our framework produces the best performance on DUC2004, 0.396 of ROUGE-1 score, 0.094 of ROUGE-2 score and 0.143 of ROUGE-SU4 which outperforms a series of popular baselines, such as DUC Best, FGB [7], and BSTM [10].展开更多
低压台区拓扑信息的准确记录是进行台区线损分析、三相不平衡治理等工作的基础。针对目前拓扑档案排查成本高且效率低的问题,提出一种基于自适应k近邻(adaptive k nearest neighbor,AKNN)异常检验和自适应密度峰值(adaptive density pea...低压台区拓扑信息的准确记录是进行台区线损分析、三相不平衡治理等工作的基础。针对目前拓扑档案排查成本高且效率低的问题,提出一种基于自适应k近邻(adaptive k nearest neighbor,AKNN)异常检验和自适应密度峰值(adaptive density peaks clustering,ADPC)聚类的低压台区拓扑识别方法。该方法利用动态时间弯曲(dynamic time warping,DTW)距离度量低压台区用户间电压序列的相似性,通过AKNN异常检验算法检验并校正异常的用户与变压器之间的关系(简称“户变关系”),在得到正确户变关系的基础上,采用ADPC聚类算法对台区内用户进行相位识别;最后,通过实际台区算例分析验证了该方法不需要人为设置参数,能有效实现低压台区的拓扑识别,具有较高的适用性与准确性。展开更多
目的 针对旋转机械故障诊断过程中存在故障信号特征提取困难、故障诊断过程有标签数据较少、故障诊断准确率低等问题,提出自适应变分模态分解算法(Adaptive Variational Mode Decomposition,AVMD)与密度峰值算法优化的模糊C均值算法(Clu...目的 针对旋转机械故障诊断过程中存在故障信号特征提取困难、故障诊断过程有标签数据较少、故障诊断准确率低等问题,提出自适应变分模态分解算法(Adaptive Variational Mode Decomposition,AVMD)与密度峰值算法优化的模糊C均值算法(Clustering by Fast Search and Find of Density Peaks Optimizing Fuzzy C-Means,DPC-FCM)结合的无监督诊断方法。方法 首先,将多尺度排列熵与峭度相结合的综合系数作为适应度函数,对VMD算法的惩罚因子alpha和模态个数K进行参数寻优,提取分解后本征模态函数(Intrinsic Mode Function,IMF)的平均样本熵与平均模糊熵,并输入至聚类算法中。其次,提出利用密度峰值聚类算法确定FCM的初始聚类中心,降低聚类结果的随机性。结果 将提出的无监督故障诊断模型应用到滚动轴承试验信号中,实现了准确的故障诊断。结论 AVMD在故障提取方面具有优越性,同时DPC算法可以有效提高FCM算法无监督聚类的准确性,二者结合可以有效实现旋转机械故障的智能分类。展开更多
Modeling of energy consumption(EC) and effluent quality(EQ) are very essential problems that need to be solved for the multiobjective optimal control in the wastewater treatment process(WWTP). To address this issue, a...Modeling of energy consumption(EC) and effluent quality(EQ) are very essential problems that need to be solved for the multiobjective optimal control in the wastewater treatment process(WWTP). To address this issue, a density peaks-based adaptive fuzzy neural network(DP-AFNN) is proposed in this study. To obtain suitable fuzzy rules, a DP-based clustering method is applied to fit the cluster centers to process nonlinearity.The parameters of the extracted fuzzy rules are fine-tuned based on the improved Levenberg-Marquardt algorithm during the training process. Furthermore, the analysis of convergence is performed to guarantee the successful application of the DPAFNN. Finally, the proposed DP-AFNN is utilized to develop the models of EC and EQ in the WWTP. The experimental results show that the proposed DP-AFNN can achieve fast convergence speed and high prediction accuracy in comparison with some existing methods.展开更多
间歇过程的多模态特性使得未考虑模态因素建立的软测量模型预测精度较低,现有的间歇过程模态划分方法对初始参数敏感且未考虑异常数据对模态划分结果的影响,其不合理的划分结果是制约多模态间歇过程软测量模型预测精度提升的一个重要因...间歇过程的多模态特性使得未考虑模态因素建立的软测量模型预测精度较低,现有的间歇过程模态划分方法对初始参数敏感且未考虑异常数据对模态划分结果的影响,其不合理的划分结果是制约多模态间歇过程软测量模型预测精度提升的一个重要因素。提出了一种基于密度加权和相似标签分配密度峰值聚类相关向量回归(weighted destiny and similar label allocation density peaks clustering-relevance vector regression, WSDPC-RVR)的多模态间歇过程软测量方法。首先,以不同数据点的密度贡献程度对低密度区域数据点的局部密度进行加权,准确选取聚类中心,并引入ε近邻结合数据点间的距离与局部密度构建剩余数据点的分配策略;然后,定义模态评价指标并分析不同模态的统计特性,构建异常模态判别策略获取有效模态数量,完成间歇过程模态划分;最后,建立各有效模态的RVR软测量模型,实现间歇过程主导变量的在线预测。青霉素发酵过程的仿真实验结果表明,所提方法能够实现合理的模态划分,有效地提高了软测量模型的预测精度。展开更多
基金National Natural Science Foundation of China Nos.61962054 and 62372353.
文摘Traditional clustering algorithms often struggle to produce satisfactory results when dealing with datasets withuneven density. Additionally, they incur substantial computational costs when applied to high-dimensional datadue to calculating similarity matrices. To alleviate these issues, we employ the KD-Tree to partition the dataset andcompute the K-nearest neighbors (KNN) density for each point, thereby avoiding the computation of similaritymatrices. Moreover, we apply the rules of voting elections, treating each data point as a voter and casting a votefor the point with the highest density among its KNN. By utilizing the vote counts of each point, we develop thestrategy for classifying noise points and potential cluster centers, allowing the algorithm to identify clusters withuneven density and complex shapes. Additionally, we define the concept of “adhesive points” between two clustersto merge adjacent clusters that have similar densities. This process helps us identify the optimal number of clustersautomatically. Experimental results indicate that our algorithm not only improves the efficiency of clustering butalso increases its accuracy.
基金Professor Hong Yu at Intelligent Fishery Innovative Team(No.C202109)in School of Information Engineering of Dalian Ocean University for her support of this workfunded by the National Natural Science Foundation of China(No.31800615 and No.21933010)。
文摘Performing cluster analysis on molecular conformation is an important way to find the representative conformation in the molecular dynamics trajectories.Usually,it is a critical step for interpreting complex conformational changes or interaction mechanisms.As one of the density-based clustering algorithms,find density peaks(FDP)is an accurate and reasonable candidate for the molecular conformation clustering.However,facing the rapidly increasing simulation length due to the increase in computing power,the low computing efficiency of FDP limits its application potential.Here we propose a marginal extension to FDP named K-means find density peaks(KFDP)to solve the mass source consuming problem.In KFDP,the points are initially clustered by a high efficiency clustering algorithm,such as K-means.Cluster centers are defined as typical points with a weight which represents the cluster size.Then,the weighted typical points are clustered again by FDP,and then are refined as core,boundary,and redefined halo points.In this way,KFDP has comparable accuracy as FDP but its computational complexity is reduced from O(n^(2))to O(n).We apply and test our KFDP method to the trajectory data of multiple small proteins in terms of torsion angle,secondary structure or contact map.The comparing results with K-means and density-based spatial clustering of applications with noise show the validation of the proposed KFDP.
基金supported by the National Natural Science Foundation of China(61401475)
文摘The key challenge of the extended target probability hypothesis density (ET-PHD) filter is to reduce the computational complexity by using a subset to approximate the full set of partitions. In this paper, the influence for the tracking results of different partitions is analyzed, and the form of the most informative partition is obtained. Then, a fast density peak-based clustering (FDPC) partitioning algorithm is applied to the measurement set partitioning. Since only one partition of the measurement set is used, the ET-PHD filter based on FDPC partitioning has lower computational complexity than the other ET-PHD filters. As FDPC partitioning is able to remove the spatially close clutter-generated measurements, the ET-PHD filter based on FDPC partitioning has good tracking performance in the scenario with more clutter-generated measurements. The simulation results show that the proposed algorithm can get the most informative partition and obviously reduce computational burden without losing tracking performance. As the number of clutter-generated measurements increased, the ET-PHD filter based on FDPC partitioning has better tracking performance than other ET-PHD filters. The FDPC algorithm will play an important role in the engineering realization of the multiple extended target tracking filter.
文摘We present a novel unsupervised integrated score framework to generate generic extractive multi- document summaries by ranking sentences based on dynamic programming (DP) strategy. Considering that cluster-based methods proposed by other researchers tend to ignore informativeness of words when they generate summaries, our proposed framework takes relevance, diversity, informativeness and length constraint of sentences into consideration comprehensively. We apply Density Peaks Clustering (DPC) to get relevance scores and diversity scores of sentences simultaneously. Our framework produces the best performance on DUC2004, 0.396 of ROUGE-1 score, 0.094 of ROUGE-2 score and 0.143 of ROUGE-SU4 which outperforms a series of popular baselines, such as DUC Best, FGB [7], and BSTM [10].
文摘低压台区拓扑信息的准确记录是进行台区线损分析、三相不平衡治理等工作的基础。针对目前拓扑档案排查成本高且效率低的问题,提出一种基于自适应k近邻(adaptive k nearest neighbor,AKNN)异常检验和自适应密度峰值(adaptive density peaks clustering,ADPC)聚类的低压台区拓扑识别方法。该方法利用动态时间弯曲(dynamic time warping,DTW)距离度量低压台区用户间电压序列的相似性,通过AKNN异常检验算法检验并校正异常的用户与变压器之间的关系(简称“户变关系”),在得到正确户变关系的基础上,采用ADPC聚类算法对台区内用户进行相位识别;最后,通过实际台区算例分析验证了该方法不需要人为设置参数,能有效实现低压台区的拓扑识别,具有较高的适用性与准确性。
文摘目的 针对旋转机械故障诊断过程中存在故障信号特征提取困难、故障诊断过程有标签数据较少、故障诊断准确率低等问题,提出自适应变分模态分解算法(Adaptive Variational Mode Decomposition,AVMD)与密度峰值算法优化的模糊C均值算法(Clustering by Fast Search and Find of Density Peaks Optimizing Fuzzy C-Means,DPC-FCM)结合的无监督诊断方法。方法 首先,将多尺度排列熵与峭度相结合的综合系数作为适应度函数,对VMD算法的惩罚因子alpha和模态个数K进行参数寻优,提取分解后本征模态函数(Intrinsic Mode Function,IMF)的平均样本熵与平均模糊熵,并输入至聚类算法中。其次,提出利用密度峰值聚类算法确定FCM的初始聚类中心,降低聚类结果的随机性。结果 将提出的无监督故障诊断模型应用到滚动轴承试验信号中,实现了准确的故障诊断。结论 AVMD在故障提取方面具有优越性,同时DPC算法可以有效提高FCM算法无监督聚类的准确性,二者结合可以有效实现旋转机械故障的智能分类。
基金supported by the National Science Foundation for Distinguished Young Scholars of China(61225016)the State Key Program of National Natural Science of China(61533002)
文摘Modeling of energy consumption(EC) and effluent quality(EQ) are very essential problems that need to be solved for the multiobjective optimal control in the wastewater treatment process(WWTP). To address this issue, a density peaks-based adaptive fuzzy neural network(DP-AFNN) is proposed in this study. To obtain suitable fuzzy rules, a DP-based clustering method is applied to fit the cluster centers to process nonlinearity.The parameters of the extracted fuzzy rules are fine-tuned based on the improved Levenberg-Marquardt algorithm during the training process. Furthermore, the analysis of convergence is performed to guarantee the successful application of the DPAFNN. Finally, the proposed DP-AFNN is utilized to develop the models of EC and EQ in the WWTP. The experimental results show that the proposed DP-AFNN can achieve fast convergence speed and high prediction accuracy in comparison with some existing methods.
文摘间歇过程的多模态特性使得未考虑模态因素建立的软测量模型预测精度较低,现有的间歇过程模态划分方法对初始参数敏感且未考虑异常数据对模态划分结果的影响,其不合理的划分结果是制约多模态间歇过程软测量模型预测精度提升的一个重要因素。提出了一种基于密度加权和相似标签分配密度峰值聚类相关向量回归(weighted destiny and similar label allocation density peaks clustering-relevance vector regression, WSDPC-RVR)的多模态间歇过程软测量方法。首先,以不同数据点的密度贡献程度对低密度区域数据点的局部密度进行加权,准确选取聚类中心,并引入ε近邻结合数据点间的距离与局部密度构建剩余数据点的分配策略;然后,定义模态评价指标并分析不同模态的统计特性,构建异常模态判别策略获取有效模态数量,完成间歇过程模态划分;最后,建立各有效模态的RVR软测量模型,实现间歇过程主导变量的在线预测。青霉素发酵过程的仿真实验结果表明,所提方法能够实现合理的模态划分,有效地提高了软测量模型的预测精度。