Clustering high dimensional data is challenging as data dimensionality increases the distance between data points,resulting in sparse regions that degrade clustering performance.Subspace clustering is a common approac...Clustering high dimensional data is challenging as data dimensionality increases the distance between data points,resulting in sparse regions that degrade clustering performance.Subspace clustering is a common approach for processing high-dimensional data by finding relevant features for each cluster in the data space.Subspace clustering methods extend traditional clustering to account for the constraints imposed by data streams.Data streams are not only high-dimensional,but also unbounded and evolving.This necessitates the development of subspace clustering algorithms that can handle high dimensionality and adapt to the unique characteristics of data streams.Although many articles have contributed to the literature review on data stream clustering,there is currently no specific review on subspace clustering algorithms in high-dimensional data streams.Therefore,this article aims to systematically review the existing literature on subspace clustering of data streams in high-dimensional streaming environments.The review follows a systematic methodological approach and includes 18 articles for the final analysis.The analysis focused on two research questions related to the general clustering process and dealing with the unbounded and evolving characteristics of data streams.The main findings relate to six elements:clustering process,cluster search,subspace search,synopsis structure,cluster maintenance,and evaluation measures.Most algorithms use a two-phase clustering approach consisting of an initialization stage,a refinement stage,a cluster maintenance stage,and a final clustering stage.The density-based top-down subspace clustering approach is more widely used than the others because it is able to distinguish true clusters and outliers using projected microclusters.Most algorithms implicitly adapt to the evolving nature of the data stream by using a time fading function that is sensitive to outliers.Future work can focus on the clustering framework,parameter optimization,subspace search techniques,memory-efficient synopsis structures,explicit cluster change detection,and intrinsic performance metrics.This article can serve as a guide for researchers interested in high-dimensional subspace clustering methods for data streams.展开更多
The study delves into the expanding role of network platforms in our daily lives, encompassing various mediums like blogs, forums, online chats, and prominent social media platforms such as Facebook, Twitter, and Inst...The study delves into the expanding role of network platforms in our daily lives, encompassing various mediums like blogs, forums, online chats, and prominent social media platforms such as Facebook, Twitter, and Instagram. While these platforms offer avenues for self-expression and community support, they concurrently harbor negative impacts, fostering antisocial behaviors like phishing, impersonation, hate speech, cyberbullying, cyberstalking, cyberterrorism, fake news propagation, spamming, and fraud. Notably, individuals also leverage these platforms to connect with authorities and seek aid during disasters. The overarching objective of this research is to address the dual nature of network platforms by proposing innovative methodologies aimed at enhancing their positive aspects and mitigating their negative repercussions. To achieve this, the study introduces a weight learning method grounded in multi-linear attribute ranking. This approach serves to evaluate the significance of attribute combinations across all feature spaces. Additionally, a novel clustering method based on tensors is proposed to elevate the quality of clustering while effectively distinguishing selected features. The methodology incorporates a weighted average similarity matrix and optionally integrates weighted Euclidean distance, contributing to a more nuanced understanding of attribute importance. The analysis of the proposed methods yields significant findings. The weight learning method proves instrumental in discerning the importance of attribute combinations, shedding light on key aspects within feature spaces. Simultaneously, the clustering method based on tensors exhibits improved efficacy in enhancing clustering quality and feature distinction. This not only advances our understanding of attribute importance but also paves the way for more nuanced data analysis methodologies. In conclusion, this research underscores the pivotal role of network platforms in contemporary society, emphasizing their potential for both positive contributions and adverse consequences. The proposed methodologies offer novel approaches to address these dualities, providing a foundation for future research and practical applications. Ultimately, this study contributes to the ongoing discourse on optimizing the utility of network platforms while minimizing their negative impacts.展开更多
Data stream clustering is integral to contemporary big data applications.However,addressing the ongoing influx of data streams efficiently and accurately remains a primary challenge in current research.This paper aims...Data stream clustering is integral to contemporary big data applications.However,addressing the ongoing influx of data streams efficiently and accurately remains a primary challenge in current research.This paper aims to elevate the efficiency and precision of data stream clustering,leveraging the TEDA(Typicality and Eccentricity Data Analysis)algorithm as a foundation,we introduce improvements by integrating a nearest neighbor search algorithm to enhance both the efficiency and accuracy of the algorithm.The original TEDA algorithm,grounded in the concept of“Typicality and Eccentricity Data Analytics”,represents an evolving and recursive method that requires no prior knowledge.While the algorithm autonomously creates and merges clusters as new data arrives,its efficiency is significantly hindered by the need to traverse all existing clusters upon the arrival of further data.This work presents the NS-TEDA(Neighbor Search Based Typicality and Eccentricity Data Analysis)algorithm by incorporating a KD-Tree(K-Dimensional Tree)algorithm integrated with the Scapegoat Tree.Upon arrival,this ensures that new data points interact solely with clusters in very close proximity.This significantly enhances algorithm efficiency while preventing a single data point from joining too many clusters and mitigating the merging of clusters with high overlap to some extent.We apply the NS-TEDA algorithm to several well-known datasets,comparing its performance with other data stream clustering algorithms and the original TEDA algorithm.The results demonstrate that the proposed algorithm achieves higher accuracy,and its runtime exhibits almost linear dependence on the volume of data,making it more suitable for large-scale data stream analysis research.展开更多
Although many multi-view clustering(MVC) algorithms with acceptable performances have been presented, to the best of our knowledge, nearly all of them need to be fed with the correct number of clusters. In addition, t...Although many multi-view clustering(MVC) algorithms with acceptable performances have been presented, to the best of our knowledge, nearly all of them need to be fed with the correct number of clusters. In addition, these existing algorithms create only the hard and fuzzy partitions for multi-view objects,which are often located in highly-overlapping areas of multi-view feature space. The adoption of hard and fuzzy partition ignores the ambiguity and uncertainty in the assignment of objects, likely leading to performance degradation. To address these issues, we propose a novel sparse reconstructive multi-view evidential clustering algorithm(SRMVEC). Based on a sparse reconstructive procedure, SRMVEC learns a shared affinity matrix across views, and maps multi-view objects to a 2-dimensional humanreadable chart by calculating 2 newly defined mathematical metrics for each object. From this chart, users can detect the number of clusters and select several objects existing in the dataset as cluster centers. Then, SRMVEC derives a credal partition under the framework of evidence theory, improving the fault tolerance of clustering. Ablation studies show the benefits of adopting the sparse reconstructive procedure and evidence theory. Besides,SRMVEC delivers effectiveness on benchmark datasets by outperforming some state-of-the-art methods.展开更多
Implementing machine learning algorithms in the non-conducive environment of the vehicular network requires some adaptations due to the high computational complexity of these algorithms.K-clustering algorithms are sim...Implementing machine learning algorithms in the non-conducive environment of the vehicular network requires some adaptations due to the high computational complexity of these algorithms.K-clustering algorithms are simplistic,with fast performance and relative accuracy.However,their implementation depends on the initial selection of clusters number(K),the initial clusters’centers,and the clustering metric.This paper investigated using Scott’s histogram formula to estimate the K number and the Link Expiration Time(LET)as a clustering metric.Realistic traffic flows were considered for three maps,namely Highway,Traffic Light junction,and Roundabout junction,to study the effect of road layout on estimating the K number.A fast version of the PAM algorithm was used for clustering with a modification to reduce time complexity.The Affinity propagation algorithm sets the baseline for the estimated K number,and the Medoid Silhouette method is used to quantify the clustering.OMNET++,Veins,and SUMO were used to simulate the traffic,while the related algorithms were implemented in Python.The Scott’s formula estimation of the K number only matched the baseline when the road layout was simple.Moreover,the clustering algorithm required one iteration on average to converge when used with LET.展开更多
This paper proposes linear and nonlinear filters for a non-Gaussian dynamic system with an unknown nominal covariance of the output noise.The challenge of designing a suitable filter in the presence of an unknown cova...This paper proposes linear and nonlinear filters for a non-Gaussian dynamic system with an unknown nominal covariance of the output noise.The challenge of designing a suitable filter in the presence of an unknown covariance matrix is addressed by focusing on the output data set of the system.Considering that data generated from a Gaussian distribution exhibit ellipsoidal scattering,we first propose the weighted sum of norms(SON)clustering method that prioritizes nearby points,reduces distant point influence,and lowers computational cost.Then,by introducing the weighted maximum likelihood,we propose a semi-definite program(SDP)to detect outliers and reduce their impacts on each cluster.Detecting these weights paves the way to obtain an appropriate covariance of the output noise.Next,two filtering approaches are presented:a cluster-based robust linear filter using the maximum a posterior(MAP)estimation and a clusterbased robust nonlinear filter assuming that output noise distribution stems from some Gaussian noise resources according to the ellipsoidal clusters.At last,simulation results demonstrate the effectiveness of our proposed filtering approaches.展开更多
Dear Editor,This letter focuses on the fixed-time(FXT)cluster optimization problem of first-order multi-agent systems(FOMASs)in an undirected network,in which the optimization objective is the sum of the objective fun...Dear Editor,This letter focuses on the fixed-time(FXT)cluster optimization problem of first-order multi-agent systems(FOMASs)in an undirected network,in which the optimization objective is the sum of the objective functions of all clusters.A novel piecewise power-law control protocol with cooperative-competition relations is proposed.Furthermore,a sufficient condition is obtained to ensure that the FOMASs achieve the cluster consensus within an FXT.展开更多
Stochastic unit commitment is one of the most powerful methods to address uncertainty. However, the existingscenario clustering technique for stochastic unit commitment cannot accurately select representative scenario...Stochastic unit commitment is one of the most powerful methods to address uncertainty. However, the existingscenario clustering technique for stochastic unit commitment cannot accurately select representative scenarios,which threatens the robustness of stochastic unit commitment and hinders its application. This paper providesa stochastic unit commitment with dynamic scenario clustering based on multi-parametric programming andBenders decomposition. The stochastic unit commitment is solved via the Benders decomposition, which decouplesthe primal problem into the master problem and two types of subproblems. In the master problem, the committedgenerator is determined, while the feasibility and optimality of generator output are checked in these twosubproblems. Scenarios are dynamically clustered during the subproblem solution process through the multiparametric programming with respect to the solution of the master problem. In other words, multiple scenariosare clustered into several representative scenarios after the subproblem is solved, and the Benders cut obtainedby the representative scenario is generated for the master problem. Different from the conventional stochasticunit commitment, the proposed approach integrates scenario clustering into the Benders decomposition solutionprocess. Such a clustering approach could accurately cluster representative scenarios that have impacts on theunit commitment. The proposed method is tested on a 6-bus system and the modified IEEE 118-bus system.Numerical results illustrate the effectiveness of the proposed method in clustering scenarios. Compared withthe conventional clustering method, the proposed method can accurately select representative scenarios whilemitigating computational burden, thus guaranteeing the robustness of unit commitment.展开更多
To solve the problems of a few optical fibre line fault samples and the inefficiency of manual communication optical fibre fault diagnosis,this paper proposes a communication optical fibre fault diagnosis model based ...To solve the problems of a few optical fibre line fault samples and the inefficiency of manual communication optical fibre fault diagnosis,this paper proposes a communication optical fibre fault diagnosis model based on variational modal decomposition(VMD),fuzzy entropy(FE)and fuzzy clustering(FC).Firstly,based on the OTDR curve data collected in the field,VMD is used to extract the different modal components(IMF)of the original signal and calculate the fuzzy entropy(FE)values of different components to characterize the subtle differences between them.The fuzzy entropy of each curve is used as the feature vector,which in turn constructs the communication optical fibre feature vector matrix,and the fuzzy clustering algorithm is used to achieve fault diagnosis of faulty optical fibre.The VMD-FE combination can extract subtle differences in features,and the fuzzy clustering algorithm does not require sample training.The experimental results show that the model in this paper has high accuracy and is relevant to the maintenance of communication optical fibre when compared with existing feature extraction models and traditional machine learning models.展开更多
Wireless sensor networks(WSN)gather information and sense information samples in a certain region and communicate these readings to a base station(BS).Energy efficiency is considered a major design issue in the WSNs,a...Wireless sensor networks(WSN)gather information and sense information samples in a certain region and communicate these readings to a base station(BS).Energy efficiency is considered a major design issue in the WSNs,and can be addressed using clustering and routing techniques.Information is sent from the source to the BS via routing procedures.However,these routing protocols must ensure that packets are delivered securely,guaranteeing that neither adversaries nor unauthentic individuals have access to the sent information.Secure data transfer is intended to protect the data from illegal access,damage,or disruption.Thus,in the proposed model,secure data transmission is developed in an energy-effective manner.A low-energy adaptive clustering hierarchy(LEACH)is developed to efficiently transfer the data.For the intrusion detection systems(IDS),Fuzzy logic and artificial neural networks(ANNs)are proposed.Initially,the nodes were randomly placed in the network and initialized to gather information.To ensure fair energy dissipation between the nodes,LEACH randomly chooses cluster heads(CHs)and allocates this role to the various nodes based on a round-robin management mechanism.The intrusion-detection procedure was then utilized to determine whether intruders were present in the network.Within the WSN,a Fuzzy interference rule was utilized to distinguish the malicious nodes from legal nodes.Subsequently,an ANN was employed to distinguish the harmful nodes from suspicious nodes.The effectiveness of the proposed approach was validated using metrics that attained 97%accuracy,97%specificity,and 97%sensitivity of 95%.Thus,it was proved that the LEACH and Fuzzy-based IDS approaches are the best choices for securing data transmission in an energy-efficient manner.展开更多
In industrial production and engineering operations,the health state of complex systems is critical,and predicting it can ensure normal operation.Complex systems have many monitoring indicators,complex coupling struct...In industrial production and engineering operations,the health state of complex systems is critical,and predicting it can ensure normal operation.Complex systems have many monitoring indicators,complex coupling structures,non-linear and time-varying characteristics,so it is a challenge to establish a reliable prediction model.The belief rule base(BRB)can fuse observed data and expert knowledge to establish a nonlinear relationship between input and output and has well modeling capabilities.Since each indicator of the complex system can reflect the health state to some extent,the BRB is built based on the causal relationship between system indicators and the health state to achieve the prediction.A health state prediction model based on BRB and long short term memory for complex systems is proposed in this paper.Firstly,the LSTMis introduced to predict the trend of the indicators in the system.Secondly,the Density Peak Clustering(DPC)algorithmis used todetermine referential values of indicators for BRB,which effectively offset the lack of expert knowledge.Then,the predicted values and expert knowledge are fused to construct BRB to predict the health state of the systems by inference.Finally,the effectiveness of the model is verified by a case study of a certain vehicle hydraulic pump.展开更多
In this editorial,we comment on the article by Wang et al.This manuscript explores the potential synergistic effects of combining zanubrutinib,a novel oral inhibitor of Bruton’s tyrosine kinase,with high-dose methotr...In this editorial,we comment on the article by Wang et al.This manuscript explores the potential synergistic effects of combining zanubrutinib,a novel oral inhibitor of Bruton’s tyrosine kinase,with high-dose methotrexate(HD-MTX)as a therapeutic intervention for primary central nervous system lymphoma(PCNSL).The study involves a retrospective analysis of 19 PCNSL patients,highlighting clinicopathological characteristics,treatment outcomes,and genomic biomarkers.The results indicate the combination’s good tolerance and strong antitumor activity,with an 84.2%overall response rate.The authors emphasize the potential of zanubrutinib to modulate key genomic features of PCNSL,particularly mutations in myeloid differentiation primary response 88 and cluster of differentiation 79B.Furthermore,the study investigates the role of circulating tumor DNA in cerebrospinal fluid for disease surveillance and treatment response monitoring.In essence,the study provides valuable insights into the potential of combining zanubrutinib with HD-MTX as a frontline therapeutic regimen for PCNSL.The findings underscore the importance of exploring alternative treatment modalities and monitoring genomic and liquid biopsy markers to optimize patient outcomes.While the findings suggest promise,the study’s limitations should be considered,and further research is needed to establish the clinical relevance of this therapeutic approach for PCNSL.展开更多
A significant portion of Landslide Early Warning Systems (LEWS) relies on the definition of operational thresholds and the monitoring of cumulative rainfall for alert issuance. These thresholds can be obtained in vari...A significant portion of Landslide Early Warning Systems (LEWS) relies on the definition of operational thresholds and the monitoring of cumulative rainfall for alert issuance. These thresholds can be obtained in various ways, but most often they are based on previous landslide data. This approach introduces several limitations. For instance, there is a requirement for the location to have been previously monitored in some way to have this type of information recorded. Another significant limitation is the need for information regarding the location and timing of incidents. Despite the current ease of obtaining location information (GPS, drone images, etc.), the timing of the event remains challenging to ascertain for a considerable portion of landslide data. Concerning rainfall monitoring, there are multiple ways to consider it, for instance, examining accumulations over various intervals (1 h, 6 h, 24 h, 72 h), as well as in the calculation of effective rainfall, which represents the precipitation that actually infiltrates the soil. However, in the vast majority of cases, both the thresholds and the rain monitoring approach are defined manually and subjectively, relying on the operators’ experience. This makes the process labor-intensive and time-consuming, hindering the establishment of a truly standardized and rapidly scalable methodology on a large scale. In this work, we propose a Landslides Early Warning System (LEWS) based on the concept of rainfall half-life and the determination of thresholds using Cluster Analysis and data inversion. The system is designed to be applied in extensive monitoring networks, such as the one utilized by Cemaden, Brazil’s National Center for Monitoring and Early Warning of Natural Disasters.展开更多
Clustering a social network is a process of grouping social actors into clusters where intra-cluster similarities among actors are higher than inter-cluster similarities. Clustering approaches, i.e. , k-medoids or hie...Clustering a social network is a process of grouping social actors into clusters where intra-cluster similarities among actors are higher than inter-cluster similarities. Clustering approaches, i.e. , k-medoids or hierarchical, use the distance function to measure the dissimilarities among actors. These distance functions need to fulfill various properties, including the triangle inequality (TI). However, in some cases, the triangle inequality might be violated, impacting the quality of the resulting clusters. With experiments, this paper explains how TI violates while performing traditional clustering techniques: k-medoids, hierarchical, DENGRAPH, and spectral clustering on social networks and how the violation of TI affects the quality of the resulting clusters.展开更多
Multi-view Subspace Clustering (MVSC) emerges as an advanced clustering method, designed to integrate diverse views to uncover a common subspace, enhancing the accuracy and robustness of clustering results. The signif...Multi-view Subspace Clustering (MVSC) emerges as an advanced clustering method, designed to integrate diverse views to uncover a common subspace, enhancing the accuracy and robustness of clustering results. The significance of low-rank prior in MVSC is emphasized, highlighting its role in capturing the global data structure across views for improved performance. However, it faces challenges with outlier sensitivity due to its reliance on the Frobenius norm for error measurement. Addressing this, our paper proposes a Low-Rank Multi-view Subspace Clustering Based on Sparse Regularization (LMVSC- Sparse) approach. Sparse regularization helps in selecting the most relevant features or views for clustering while ignoring irrelevant or noisy ones. This leads to a more efficient and effective representation of the data, improving the clustering accuracy and robustness, especially in the presence of outliers or noisy data. By incorporating sparse regularization, LMVSC-Sparse can effectively handle outlier sensitivity, which is a common challenge in traditional MVSC methods relying solely on low-rank priors. Then Alternating Direction Method of Multipliers (ADMM) algorithm is employed to solve the proposed optimization problems. Our comprehensive experiments demonstrate the efficiency and effectiveness of LMVSC-Sparse, offering a robust alternative to traditional MVSC methods.展开更多
The k-means algorithm is a popular data clustering technique due to its speed and simplicity. However, it is susceptible to issues such as sensitivity to the chosen seeds, and inaccurate clusters due to poor initial s...The k-means algorithm is a popular data clustering technique due to its speed and simplicity. However, it is susceptible to issues such as sensitivity to the chosen seeds, and inaccurate clusters due to poor initial seeds, particularly in complex datasets or datasets with non-spherical clusters. In this paper, a Comprehensive K-Means Clustering algorithm is presented, in which multiple trials of k-means are performed on a given dataset. The clustering results from each trial are transformed into a five-dimensional data point, containing the scope values of the x and y coordinates of the clusters along with the number of points within that cluster. A graph is then generated displaying the configuration of these points using Principal Component Analysis (PCA), from which we can observe and determine the common clustering patterns in the dataset. The robustness and strength of these patterns are then examined by observing the variance of the results of each trial, wherein a different subset of the data keeping a certain percentage of original data points is clustered. By aggregating information from multiple trials, we can distinguish clusters that consistently emerge across different runs from those that are more sensitive or unlikely, hence deriving more reliable conclusions about the underlying structure of complex datasets. Our experiments show that our algorithm is able to find the most common associations between different dimensions of data over multiple trials, often more accurately than other algorithms, as well as measure stability of these clusters, an ability that other k-means algorithms lack.展开更多
The network performance and the unmanned aerial vehicle(UAV)number are important objectives when UAVs are placed as communication relays to enhance the multi-agent information exchange.The problem is a non-determinist...The network performance and the unmanned aerial vehicle(UAV)number are important objectives when UAVs are placed as communication relays to enhance the multi-agent information exchange.The problem is a non-deterministic polynomial hard(NP-hard)multi-objective optimization problem,instead of generating a Pareto solution,this work focuses on considering both objectives at the same level so as to achieve a balanced solution between them.Based on the property that agents connected to the same UAV are a cluster,two clustering-based algorithms,M-K-means(MKM)and modified fast search and find density of peaks(MFSFDP)methods,are first proposed.Since the former algorithm requires too much computational time and the latter one requires too many relays,an algorithm for the balanced network performance and relay number(BPN)is proposed by discretizing the area to avoid missing the optimal relay positions and defining a new local density function to reflect the network performance metric.Simulation results demonstrate that the proposed algorithms are feasible and effective.Comparisons between these algorithms show that the BPN algorithm uses fewer relay UAVs than the MFSFDP and classic set-covering based algorithm,and its computational time is far less than the MKM algorithm.展开更多
News feed is one of the potential information providing sources which give updates on various topics of different domains.These updates on various topics need to be collected since the domain specific interested users...News feed is one of the potential information providing sources which give updates on various topics of different domains.These updates on various topics need to be collected since the domain specific interested users are in need of important updates in their domains with organized data from various sources.In this paper,the news summarization system is proposed for the news data streams from RSS feeds and Google news.Since news stream analysis requires live content,the news data are continuously collected for our experimentation.Themajor contributions of thiswork involve domain corpus based news collection,news content extraction,hierarchical clustering of the news and summarization of news.Many of the existing news summarization systems lack in providing dynamic content with domain wise representation.This is alleviated in our proposed systemby tagging the news feed with domain corpuses and organizing the news streams with the hierarchical structure with topic wise representation.Further,the news streams are summarized for the users with a novel summarization algorithm.The proposed summarization system generates topic wise summaries effectively for the user and no system in the literature has handled the news summarization by collecting the data dynamically and organizing the content hierarchically.The proposed system is compared with existing systems and achieves better results in generating news summaries.The Online news content editors are highly benefitted by this system for instantly getting the news summaries of their domain interest.展开更多
A parametric study of the clustering transition of a vibration-driven granular gas system is performed by simulation.The parameters studied include the global volume fraction of the system,the size of the system,the f...A parametric study of the clustering transition of a vibration-driven granular gas system is performed by simulation.The parameters studied include the global volume fraction of the system,the size of the system,the friction coefficient,and the restitution coefficient among particles and among particle-walls.The periodic boundary and fixed boundary of sidewalls are also checked in the simulation.The simulation results provide us the necessary“heating”time for the system to reach steady state,and the friction term needed to be included in the“cooling”time.A gas-cluster phase diagram obtained through Kolmogorov-Smirnov(K-S)test analysis using similar experimental parameters is given.The influence of the parameters to the transition is then investigated in simulations.This simulation investigation helps us gain understanding which otherwise cannot be obtained by experiment alone,and makes suggestions on the determination of parameters to be chosen in experiments.展开更多
文摘Clustering high dimensional data is challenging as data dimensionality increases the distance between data points,resulting in sparse regions that degrade clustering performance.Subspace clustering is a common approach for processing high-dimensional data by finding relevant features for each cluster in the data space.Subspace clustering methods extend traditional clustering to account for the constraints imposed by data streams.Data streams are not only high-dimensional,but also unbounded and evolving.This necessitates the development of subspace clustering algorithms that can handle high dimensionality and adapt to the unique characteristics of data streams.Although many articles have contributed to the literature review on data stream clustering,there is currently no specific review on subspace clustering algorithms in high-dimensional data streams.Therefore,this article aims to systematically review the existing literature on subspace clustering of data streams in high-dimensional streaming environments.The review follows a systematic methodological approach and includes 18 articles for the final analysis.The analysis focused on two research questions related to the general clustering process and dealing with the unbounded and evolving characteristics of data streams.The main findings relate to six elements:clustering process,cluster search,subspace search,synopsis structure,cluster maintenance,and evaluation measures.Most algorithms use a two-phase clustering approach consisting of an initialization stage,a refinement stage,a cluster maintenance stage,and a final clustering stage.The density-based top-down subspace clustering approach is more widely used than the others because it is able to distinguish true clusters and outliers using projected microclusters.Most algorithms implicitly adapt to the evolving nature of the data stream by using a time fading function that is sensitive to outliers.Future work can focus on the clustering framework,parameter optimization,subspace search techniques,memory-efficient synopsis structures,explicit cluster change detection,and intrinsic performance metrics.This article can serve as a guide for researchers interested in high-dimensional subspace clustering methods for data streams.
基金sponsored by the National Natural Science Foundation of P.R.China(Nos.62102194 and 62102196)Six Talent Peaks Project of Jiangsu Province(No.RJFW-111)Postgraduate Research and Practice Innovation Program of Jiangsu Province(Nos.KYCX23_1087 and KYCX22_1027).
文摘The study delves into the expanding role of network platforms in our daily lives, encompassing various mediums like blogs, forums, online chats, and prominent social media platforms such as Facebook, Twitter, and Instagram. While these platforms offer avenues for self-expression and community support, they concurrently harbor negative impacts, fostering antisocial behaviors like phishing, impersonation, hate speech, cyberbullying, cyberstalking, cyberterrorism, fake news propagation, spamming, and fraud. Notably, individuals also leverage these platforms to connect with authorities and seek aid during disasters. The overarching objective of this research is to address the dual nature of network platforms by proposing innovative methodologies aimed at enhancing their positive aspects and mitigating their negative repercussions. To achieve this, the study introduces a weight learning method grounded in multi-linear attribute ranking. This approach serves to evaluate the significance of attribute combinations across all feature spaces. Additionally, a novel clustering method based on tensors is proposed to elevate the quality of clustering while effectively distinguishing selected features. The methodology incorporates a weighted average similarity matrix and optionally integrates weighted Euclidean distance, contributing to a more nuanced understanding of attribute importance. The analysis of the proposed methods yields significant findings. The weight learning method proves instrumental in discerning the importance of attribute combinations, shedding light on key aspects within feature spaces. Simultaneously, the clustering method based on tensors exhibits improved efficacy in enhancing clustering quality and feature distinction. This not only advances our understanding of attribute importance but also paves the way for more nuanced data analysis methodologies. In conclusion, this research underscores the pivotal role of network platforms in contemporary society, emphasizing their potential for both positive contributions and adverse consequences. The proposed methodologies offer novel approaches to address these dualities, providing a foundation for future research and practical applications. Ultimately, this study contributes to the ongoing discourse on optimizing the utility of network platforms while minimizing their negative impacts.
基金This research was funded by the National Natural Science Foundation of China(Grant No.72001190)by the Ministry of Education’s Humanities and Social Science Project via the China Ministry of Education(Grant No.20YJC630173)by Zhejiang A&F University(Grant No.2022LFR062).
文摘Data stream clustering is integral to contemporary big data applications.However,addressing the ongoing influx of data streams efficiently and accurately remains a primary challenge in current research.This paper aims to elevate the efficiency and precision of data stream clustering,leveraging the TEDA(Typicality and Eccentricity Data Analysis)algorithm as a foundation,we introduce improvements by integrating a nearest neighbor search algorithm to enhance both the efficiency and accuracy of the algorithm.The original TEDA algorithm,grounded in the concept of“Typicality and Eccentricity Data Analytics”,represents an evolving and recursive method that requires no prior knowledge.While the algorithm autonomously creates and merges clusters as new data arrives,its efficiency is significantly hindered by the need to traverse all existing clusters upon the arrival of further data.This work presents the NS-TEDA(Neighbor Search Based Typicality and Eccentricity Data Analysis)algorithm by incorporating a KD-Tree(K-Dimensional Tree)algorithm integrated with the Scapegoat Tree.Upon arrival,this ensures that new data points interact solely with clusters in very close proximity.This significantly enhances algorithm efficiency while preventing a single data point from joining too many clusters and mitigating the merging of clusters with high overlap to some extent.We apply the NS-TEDA algorithm to several well-known datasets,comparing its performance with other data stream clustering algorithms and the original TEDA algorithm.The results demonstrate that the proposed algorithm achieves higher accuracy,and its runtime exhibits almost linear dependence on the volume of data,making it more suitable for large-scale data stream analysis research.
基金supported in part by NUS startup grantthe National Natural Science Foundation of China (52076037)。
文摘Although many multi-view clustering(MVC) algorithms with acceptable performances have been presented, to the best of our knowledge, nearly all of them need to be fed with the correct number of clusters. In addition, these existing algorithms create only the hard and fuzzy partitions for multi-view objects,which are often located in highly-overlapping areas of multi-view feature space. The adoption of hard and fuzzy partition ignores the ambiguity and uncertainty in the assignment of objects, likely leading to performance degradation. To address these issues, we propose a novel sparse reconstructive multi-view evidential clustering algorithm(SRMVEC). Based on a sparse reconstructive procedure, SRMVEC learns a shared affinity matrix across views, and maps multi-view objects to a 2-dimensional humanreadable chart by calculating 2 newly defined mathematical metrics for each object. From this chart, users can detect the number of clusters and select several objects existing in the dataset as cluster centers. Then, SRMVEC derives a credal partition under the framework of evidence theory, improving the fault tolerance of clustering. Ablation studies show the benefits of adopting the sparse reconstructive procedure and evidence theory. Besides,SRMVEC delivers effectiveness on benchmark datasets by outperforming some state-of-the-art methods.
文摘Implementing machine learning algorithms in the non-conducive environment of the vehicular network requires some adaptations due to the high computational complexity of these algorithms.K-clustering algorithms are simplistic,with fast performance and relative accuracy.However,their implementation depends on the initial selection of clusters number(K),the initial clusters’centers,and the clustering metric.This paper investigated using Scott’s histogram formula to estimate the K number and the Link Expiration Time(LET)as a clustering metric.Realistic traffic flows were considered for three maps,namely Highway,Traffic Light junction,and Roundabout junction,to study the effect of road layout on estimating the K number.A fast version of the PAM algorithm was used for clustering with a modification to reduce time complexity.The Affinity propagation algorithm sets the baseline for the estimated K number,and the Medoid Silhouette method is used to quantify the clustering.OMNET++,Veins,and SUMO were used to simulate the traffic,while the related algorithms were implemented in Python.The Scott’s formula estimation of the K number only matched the baseline when the road layout was simple.Moreover,the clustering algorithm required one iteration on average to converge when used with LET.
文摘This paper proposes linear and nonlinear filters for a non-Gaussian dynamic system with an unknown nominal covariance of the output noise.The challenge of designing a suitable filter in the presence of an unknown covariance matrix is addressed by focusing on the output data set of the system.Considering that data generated from a Gaussian distribution exhibit ellipsoidal scattering,we first propose the weighted sum of norms(SON)clustering method that prioritizes nearby points,reduces distant point influence,and lowers computational cost.Then,by introducing the weighted maximum likelihood,we propose a semi-definite program(SDP)to detect outliers and reduce their impacts on each cluster.Detecting these weights paves the way to obtain an appropriate covariance of the output noise.Next,two filtering approaches are presented:a cluster-based robust linear filter using the maximum a posterior(MAP)estimation and a clusterbased robust nonlinear filter assuming that output noise distribution stems from some Gaussian noise resources according to the ellipsoidal clusters.At last,simulation results demonstrate the effectiveness of our proposed filtering approaches.
基金supported in part by the National Natural Science Foundation of China(62373231,61973201)the Fundamental Research Program of Shanxi Province(202203021211297)Shanxi Scholarship Council of China(2023-002)。
文摘Dear Editor,This letter focuses on the fixed-time(FXT)cluster optimization problem of first-order multi-agent systems(FOMASs)in an undirected network,in which the optimization objective is the sum of the objective functions of all clusters.A novel piecewise power-law control protocol with cooperative-competition relations is proposed.Furthermore,a sufficient condition is obtained to ensure that the FOMASs achieve the cluster consensus within an FXT.
基金the Science and Technology Project of State Grid Corporation of China,Grant Number 5108-202304065A-1-1-ZN.
文摘Stochastic unit commitment is one of the most powerful methods to address uncertainty. However, the existingscenario clustering technique for stochastic unit commitment cannot accurately select representative scenarios,which threatens the robustness of stochastic unit commitment and hinders its application. This paper providesa stochastic unit commitment with dynamic scenario clustering based on multi-parametric programming andBenders decomposition. The stochastic unit commitment is solved via the Benders decomposition, which decouplesthe primal problem into the master problem and two types of subproblems. In the master problem, the committedgenerator is determined, while the feasibility and optimality of generator output are checked in these twosubproblems. Scenarios are dynamically clustered during the subproblem solution process through the multiparametric programming with respect to the solution of the master problem. In other words, multiple scenariosare clustered into several representative scenarios after the subproblem is solved, and the Benders cut obtainedby the representative scenario is generated for the master problem. Different from the conventional stochasticunit commitment, the proposed approach integrates scenario clustering into the Benders decomposition solutionprocess. Such a clustering approach could accurately cluster representative scenarios that have impacts on theunit commitment. The proposed method is tested on a 6-bus system and the modified IEEE 118-bus system.Numerical results illustrate the effectiveness of the proposed method in clustering scenarios. Compared withthe conventional clustering method, the proposed method can accurately select representative scenarios whilemitigating computational burden, thus guaranteeing the robustness of unit commitment.
基金This paper is supported by State Grid Gansu Electric Power Company Science and Technology Project(20220515003).
文摘To solve the problems of a few optical fibre line fault samples and the inefficiency of manual communication optical fibre fault diagnosis,this paper proposes a communication optical fibre fault diagnosis model based on variational modal decomposition(VMD),fuzzy entropy(FE)and fuzzy clustering(FC).Firstly,based on the OTDR curve data collected in the field,VMD is used to extract the different modal components(IMF)of the original signal and calculate the fuzzy entropy(FE)values of different components to characterize the subtle differences between them.The fuzzy entropy of each curve is used as the feature vector,which in turn constructs the communication optical fibre feature vector matrix,and the fuzzy clustering algorithm is used to achieve fault diagnosis of faulty optical fibre.The VMD-FE combination can extract subtle differences in features,and the fuzzy clustering algorithm does not require sample training.The experimental results show that the model in this paper has high accuracy and is relevant to the maintenance of communication optical fibre when compared with existing feature extraction models and traditional machine learning models.
文摘Wireless sensor networks(WSN)gather information and sense information samples in a certain region and communicate these readings to a base station(BS).Energy efficiency is considered a major design issue in the WSNs,and can be addressed using clustering and routing techniques.Information is sent from the source to the BS via routing procedures.However,these routing protocols must ensure that packets are delivered securely,guaranteeing that neither adversaries nor unauthentic individuals have access to the sent information.Secure data transfer is intended to protect the data from illegal access,damage,or disruption.Thus,in the proposed model,secure data transmission is developed in an energy-effective manner.A low-energy adaptive clustering hierarchy(LEACH)is developed to efficiently transfer the data.For the intrusion detection systems(IDS),Fuzzy logic and artificial neural networks(ANNs)are proposed.Initially,the nodes were randomly placed in the network and initialized to gather information.To ensure fair energy dissipation between the nodes,LEACH randomly chooses cluster heads(CHs)and allocates this role to the various nodes based on a round-robin management mechanism.The intrusion-detection procedure was then utilized to determine whether intruders were present in the network.Within the WSN,a Fuzzy interference rule was utilized to distinguish the malicious nodes from legal nodes.Subsequently,an ANN was employed to distinguish the harmful nodes from suspicious nodes.The effectiveness of the proposed approach was validated using metrics that attained 97%accuracy,97%specificity,and 97%sensitivity of 95%.Thus,it was proved that the LEACH and Fuzzy-based IDS approaches are the best choices for securing data transmission in an energy-efficient manner.
基金supported by the Natural Science Foundation of China underGrant 61833016 and 61873293the Shaanxi OutstandingYouth Science Foundation underGrant 2020JC-34the Shaanxi Science and Technology Innovation Team under Grant 2022TD-24.
文摘In industrial production and engineering operations,the health state of complex systems is critical,and predicting it can ensure normal operation.Complex systems have many monitoring indicators,complex coupling structures,non-linear and time-varying characteristics,so it is a challenge to establish a reliable prediction model.The belief rule base(BRB)can fuse observed data and expert knowledge to establish a nonlinear relationship between input and output and has well modeling capabilities.Since each indicator of the complex system can reflect the health state to some extent,the BRB is built based on the causal relationship between system indicators and the health state to achieve the prediction.A health state prediction model based on BRB and long short term memory for complex systems is proposed in this paper.Firstly,the LSTMis introduced to predict the trend of the indicators in the system.Secondly,the Density Peak Clustering(DPC)algorithmis used todetermine referential values of indicators for BRB,which effectively offset the lack of expert knowledge.Then,the predicted values and expert knowledge are fused to construct BRB to predict the health state of the systems by inference.Finally,the effectiveness of the model is verified by a case study of a certain vehicle hydraulic pump.
文摘In this editorial,we comment on the article by Wang et al.This manuscript explores the potential synergistic effects of combining zanubrutinib,a novel oral inhibitor of Bruton’s tyrosine kinase,with high-dose methotrexate(HD-MTX)as a therapeutic intervention for primary central nervous system lymphoma(PCNSL).The study involves a retrospective analysis of 19 PCNSL patients,highlighting clinicopathological characteristics,treatment outcomes,and genomic biomarkers.The results indicate the combination’s good tolerance and strong antitumor activity,with an 84.2%overall response rate.The authors emphasize the potential of zanubrutinib to modulate key genomic features of PCNSL,particularly mutations in myeloid differentiation primary response 88 and cluster of differentiation 79B.Furthermore,the study investigates the role of circulating tumor DNA in cerebrospinal fluid for disease surveillance and treatment response monitoring.In essence,the study provides valuable insights into the potential of combining zanubrutinib with HD-MTX as a frontline therapeutic regimen for PCNSL.The findings underscore the importance of exploring alternative treatment modalities and monitoring genomic and liquid biopsy markers to optimize patient outcomes.While the findings suggest promise,the study’s limitations should be considered,and further research is needed to establish the clinical relevance of this therapeutic approach for PCNSL.
文摘A significant portion of Landslide Early Warning Systems (LEWS) relies on the definition of operational thresholds and the monitoring of cumulative rainfall for alert issuance. These thresholds can be obtained in various ways, but most often they are based on previous landslide data. This approach introduces several limitations. For instance, there is a requirement for the location to have been previously monitored in some way to have this type of information recorded. Another significant limitation is the need for information regarding the location and timing of incidents. Despite the current ease of obtaining location information (GPS, drone images, etc.), the timing of the event remains challenging to ascertain for a considerable portion of landslide data. Concerning rainfall monitoring, there are multiple ways to consider it, for instance, examining accumulations over various intervals (1 h, 6 h, 24 h, 72 h), as well as in the calculation of effective rainfall, which represents the precipitation that actually infiltrates the soil. However, in the vast majority of cases, both the thresholds and the rain monitoring approach are defined manually and subjectively, relying on the operators’ experience. This makes the process labor-intensive and time-consuming, hindering the establishment of a truly standardized and rapidly scalable methodology on a large scale. In this work, we propose a Landslides Early Warning System (LEWS) based on the concept of rainfall half-life and the determination of thresholds using Cluster Analysis and data inversion. The system is designed to be applied in extensive monitoring networks, such as the one utilized by Cemaden, Brazil’s National Center for Monitoring and Early Warning of Natural Disasters.
文摘Clustering a social network is a process of grouping social actors into clusters where intra-cluster similarities among actors are higher than inter-cluster similarities. Clustering approaches, i.e. , k-medoids or hierarchical, use the distance function to measure the dissimilarities among actors. These distance functions need to fulfill various properties, including the triangle inequality (TI). However, in some cases, the triangle inequality might be violated, impacting the quality of the resulting clusters. With experiments, this paper explains how TI violates while performing traditional clustering techniques: k-medoids, hierarchical, DENGRAPH, and spectral clustering on social networks and how the violation of TI affects the quality of the resulting clusters.
文摘Multi-view Subspace Clustering (MVSC) emerges as an advanced clustering method, designed to integrate diverse views to uncover a common subspace, enhancing the accuracy and robustness of clustering results. The significance of low-rank prior in MVSC is emphasized, highlighting its role in capturing the global data structure across views for improved performance. However, it faces challenges with outlier sensitivity due to its reliance on the Frobenius norm for error measurement. Addressing this, our paper proposes a Low-Rank Multi-view Subspace Clustering Based on Sparse Regularization (LMVSC- Sparse) approach. Sparse regularization helps in selecting the most relevant features or views for clustering while ignoring irrelevant or noisy ones. This leads to a more efficient and effective representation of the data, improving the clustering accuracy and robustness, especially in the presence of outliers or noisy data. By incorporating sparse regularization, LMVSC-Sparse can effectively handle outlier sensitivity, which is a common challenge in traditional MVSC methods relying solely on low-rank priors. Then Alternating Direction Method of Multipliers (ADMM) algorithm is employed to solve the proposed optimization problems. Our comprehensive experiments demonstrate the efficiency and effectiveness of LMVSC-Sparse, offering a robust alternative to traditional MVSC methods.
文摘The k-means algorithm is a popular data clustering technique due to its speed and simplicity. However, it is susceptible to issues such as sensitivity to the chosen seeds, and inaccurate clusters due to poor initial seeds, particularly in complex datasets or datasets with non-spherical clusters. In this paper, a Comprehensive K-Means Clustering algorithm is presented, in which multiple trials of k-means are performed on a given dataset. The clustering results from each trial are transformed into a five-dimensional data point, containing the scope values of the x and y coordinates of the clusters along with the number of points within that cluster. A graph is then generated displaying the configuration of these points using Principal Component Analysis (PCA), from which we can observe and determine the common clustering patterns in the dataset. The robustness and strength of these patterns are then examined by observing the variance of the results of each trial, wherein a different subset of the data keeping a certain percentage of original data points is clustered. By aggregating information from multiple trials, we can distinguish clusters that consistently emerge across different runs from those that are more sensitive or unlikely, hence deriving more reliable conclusions about the underlying structure of complex datasets. Our experiments show that our algorithm is able to find the most common associations between different dimensions of data over multiple trials, often more accurately than other algorithms, as well as measure stability of these clusters, an ability that other k-means algorithms lack.
基金the National Natural Science Foundation of China(61573285)。
文摘The network performance and the unmanned aerial vehicle(UAV)number are important objectives when UAVs are placed as communication relays to enhance the multi-agent information exchange.The problem is a non-deterministic polynomial hard(NP-hard)multi-objective optimization problem,instead of generating a Pareto solution,this work focuses on considering both objectives at the same level so as to achieve a balanced solution between them.Based on the property that agents connected to the same UAV are a cluster,two clustering-based algorithms,M-K-means(MKM)and modified fast search and find density of peaks(MFSFDP)methods,are first proposed.Since the former algorithm requires too much computational time and the latter one requires too many relays,an algorithm for the balanced network performance and relay number(BPN)is proposed by discretizing the area to avoid missing the optimal relay positions and defining a new local density function to reflect the network performance metric.Simulation results demonstrate that the proposed algorithms are feasible and effective.Comparisons between these algorithms show that the BPN algorithm uses fewer relay UAVs than the MFSFDP and classic set-covering based algorithm,and its computational time is far less than the MKM algorithm.
文摘News feed is one of the potential information providing sources which give updates on various topics of different domains.These updates on various topics need to be collected since the domain specific interested users are in need of important updates in their domains with organized data from various sources.In this paper,the news summarization system is proposed for the news data streams from RSS feeds and Google news.Since news stream analysis requires live content,the news data are continuously collected for our experimentation.Themajor contributions of thiswork involve domain corpus based news collection,news content extraction,hierarchical clustering of the news and summarization of news.Many of the existing news summarization systems lack in providing dynamic content with domain wise representation.This is alleviated in our proposed systemby tagging the news feed with domain corpuses and organizing the news streams with the hierarchical structure with topic wise representation.Further,the news streams are summarized for the users with a novel summarization algorithm.The proposed summarization system generates topic wise summaries effectively for the user and no system in the literature has handled the news summarization by collecting the data dynamically and organizing the content hierarchically.The proposed system is compared with existing systems and achieves better results in generating news summaries.The Online news content editors are highly benefitted by this system for instantly getting the news summaries of their domain interest.
基金Project supported by the National Natural Science Foundation of China(Grant Nos.U1738120,11474326,and 11705256)Young Scholar of CAS”Light of West China”Program for Guanghui Yang(Grant No.2018-98)+1 种基金the Strategic Priority Research Program of Chinese Academy of Sciences(Grant No.XDA21010202)the International Cooperation Project of China Manned Space Program.
文摘A parametric study of the clustering transition of a vibration-driven granular gas system is performed by simulation.The parameters studied include the global volume fraction of the system,the size of the system,the friction coefficient,and the restitution coefficient among particles and among particle-walls.The periodic boundary and fixed boundary of sidewalls are also checked in the simulation.The simulation results provide us the necessary“heating”time for the system to reach steady state,and the friction term needed to be included in the“cooling”time.A gas-cluster phase diagram obtained through Kolmogorov-Smirnov(K-S)test analysis using similar experimental parameters is given.The influence of the parameters to the transition is then investigated in simulations.This simulation investigation helps us gain understanding which otherwise cannot be obtained by experiment alone,and makes suggestions on the determination of parameters to be chosen in experiments.