The k-means algorithm is a popular data clustering technique due to its speed and simplicity. However, it is susceptible to issues such as sensitivity to the chosen seeds, and inaccurate clusters due to poor initial s...The k-means algorithm is a popular data clustering technique due to its speed and simplicity. However, it is susceptible to issues such as sensitivity to the chosen seeds, and inaccurate clusters due to poor initial seeds, particularly in complex datasets or datasets with non-spherical clusters. In this paper, a Comprehensive K-Means Clustering algorithm is presented, in which multiple trials of k-means are performed on a given dataset. The clustering results from each trial are transformed into a five-dimensional data point, containing the scope values of the x and y coordinates of the clusters along with the number of points within that cluster. A graph is then generated displaying the configuration of these points using Principal Component Analysis (PCA), from which we can observe and determine the common clustering patterns in the dataset. The robustness and strength of these patterns are then examined by observing the variance of the results of each trial, wherein a different subset of the data keeping a certain percentage of original data points is clustered. By aggregating information from multiple trials, we can distinguish clusters that consistently emerge across different runs from those that are more sensitive or unlikely, hence deriving more reliable conclusions about the underlying structure of complex datasets. Our experiments show that our algorithm is able to find the most common associations between different dimensions of data over multiple trials, often more accurately than other algorithms, as well as measure stability of these clusters, an ability that other k-means algorithms lack.展开更多
As a mainstream research direction in the field of image segmentation,medical image segmentation plays a key role in the quantification of lesions,three-dimensional reconstruction,region of interest extraction and so ...As a mainstream research direction in the field of image segmentation,medical image segmentation plays a key role in the quantification of lesions,three-dimensional reconstruction,region of interest extraction and so on.Compared with natural images,medical images have a variety of modes.Besides,the emphasis of information which is conveyed by images of different modes is quite different.Because it is time-consuming and inefficient to manually segment medical images only by professional and experienced doctors.Therefore,large quantities of automated medical image segmentation methods have been developed.However,until now,researchers have not developed a universal method for all types of medical image segmentation.This paper reviews the literature on segmentation techniques that have produced major breakthroughs in recent years.Among the large quantities of medical image segmentation methods,this paper mainly discusses two categories of medical image segmentation methods.One is the improved strategies based on traditional clustering method.The other is the research progress of the improved image segmentation network structure model based on U-Net.The power of technology proves that the performance of the deep learning-based method is significantly better than that of the traditional method.This paper discussed both advantages and disadvantages of different algorithms and detailed how these methods can be used for the segmentation of lesions or other organs and tissues,as well as possible technical trends for future work.展开更多
Various types of plasma events emerge in specific parameter ranges and exhibit similar characteristics in diagnostic signals,which can be applied to identify these events.A semisupervised machine learning algorithm,th...Various types of plasma events emerge in specific parameter ranges and exhibit similar characteristics in diagnostic signals,which can be applied to identify these events.A semisupervised machine learning algorithm,the k-means clustering algorithm,is utilized to investigate and identify plasma events in the J-TEXT plasma.This method can cluster diverse plasma events with homogeneous features,and then these events can be identified if given few manually labeled examples based on physical understanding.A survey of clustered events reveals that the k-means algorithm can make plasma events(rotating tearing mode,sawtooth oscillations,and locked mode)gathering in Euclidean space composed of multi-dimensional diagnostic data,like soft x-ray emission intensity,edge toroidal rotation velocity,the Mirnov signal amplitude and so on.Based on the cluster analysis results,an approximate analytical model is proposed to rapidly identify plasma events in the J-TEXT plasma.The cluster analysis method is conducive to data markers of massive diagnostic data.展开更多
Computer Tomography in medical imaging provides human internal body pictures in the digital form. The more quality images it provides, the better information we get. Normally, medical imaging can be constructed by pro...Computer Tomography in medical imaging provides human internal body pictures in the digital form. The more quality images it provides, the better information we get. Normally, medical imaging can be constructed by projection data from several perspectives. In this paper, our research challenges and describes a numerical method for refining the image of a Region of Interest (ROI) by constructing support within a standard CT image. It is obvious that the quality of tomographic slice is affected by artifacts. CT using filter and K-means clustering provides a way to reconstruct an ROI with minimal artifacts and improve the degree of the spatial resolution. Experimental results are presented for improving the reconstructed images, showing that the approach enhances the overall resolution and contrast of ROI images. Our method provides a number of advantages: robustness with noise in projection data and support construction without the need to acquire any additional setup.展开更多
Reservoir classification is a key link in reservoir evaluation.However,traditional manual means are inefficient,subjective,and classification standards are not uniform.Therefore,taking the Mishrif Formation of the Wes...Reservoir classification is a key link in reservoir evaluation.However,traditional manual means are inefficient,subjective,and classification standards are not uniform.Therefore,taking the Mishrif Formation of the Western Iraq as an example,a new reservoir classification and discrimination method is established by using the K-means clustering method and the Bayesian discrimination method.These methods are applied to non-cored wells to calculate the discrimination accuracy of the reservoir type,and thus the main reasons for low accuracy of reservoir discrimination are clarified.The results show that the discrimination accuracy of reservoir type based on K-means clustering and Bayesian stepwise discrimination is strongly related to the accuracy of the core data.The discrimination accuracy rate of TypeⅠ,TypeⅡ,and TypeⅤreservoirs is found to be significantly higher than that of TypeⅢand TypeⅣreservoirs using the method of combining K-means clustering and Bayesian theory based on logging data.Although the recognition accuracy of the new methodology for the TypeⅣreservoir is low,with average accuracy the new method has reached more than 82%in the entire study area,which lays a good foundation for rapid and accurate discrimination of reservoir types and the fine evaluation of a reservoir.展开更多
Several pests feed on leaves,stems,bases,and the entire plant,causing plant illnesses.As a result,it is vital to identify and eliminate the disease before causing any damage to plants.Manually detecting plant disease ...Several pests feed on leaves,stems,bases,and the entire plant,causing plant illnesses.As a result,it is vital to identify and eliminate the disease before causing any damage to plants.Manually detecting plant disease and treating it is pretty challenging in this period.Image processing is employed to detect plant disease since it requires much effort and an extended processing period.The main goal of this study is to discover the disease that affects the plants by creating an image processing system that can recognize and classify four different forms of plant diseases,including Phytophthora infestans,Fusarium graminearum,Puccinia graminis,tomato yellow leaf curl.Therefore,this work uses the Support vector machine(SVM)classifier to detect and classify the plant disease using various steps like image acquisition,Pre-processing,Segmentation,feature extraction,and classification.The gray level co-occurrence matrix(GLCM)and the local binary pattern features(LBP)are used to identify the disease-affected portion of the plant leaf.According to experimental data,the proposed technology can correctly detect and diagnose plant sickness with a 97.2 percent accuracy.展开更多
In recent years,the soft subspace clustering algorithm has shown good results for high-dimensional data,which can assign different weights to each cluster class and use weights to measure the contribution of each dime...In recent years,the soft subspace clustering algorithm has shown good results for high-dimensional data,which can assign different weights to each cluster class and use weights to measure the contribution of each dimension in various features.The enhanced soft subspace clustering algorithm combines interclass separation and intraclass tightness information,which has strong results for image segmentation,but the clustering algorithm is vulnerable to noisy data and dependence on the initialized clustering center.However,the clustering algorithmis susceptible to the influence of noisydata and reliance on initializedclustering centers andfalls into a local optimum;the clustering effect is poor for brain MR images with unclear boundaries and noise effects.To address these problems,a soft subspace clustering algorithm for brain MR images based on genetic algorithm optimization is proposed,which combines the generalized noise technique,relaxes the equational weight constraint in the objective function as the boundary constraint,and uses a genetic algorithm as a method to optimize the initialized clustering center.The genetic algorithm finds the best clustering center and reduces the algorithm’s dependence on the initial clustering center.The experiment verifies the robustness of the algorithm,as well as the noise immunity in various ways and shows good results on the common dataset and the brain MR images provided by the Changshu First People’s Hospital with specific high accuracy for clinical medicine.展开更多
The COVID-19 pandemic has caused an unprecedented spike in confirmed cases in 230 countries globally. In this work, a set of data from the COVID-19 coronavirus outbreak has been subjected to two well-known unsupervise...The COVID-19 pandemic has caused an unprecedented spike in confirmed cases in 230 countries globally. In this work, a set of data from the COVID-19 coronavirus outbreak has been subjected to two well-known unsupervised learning techniques: K-means clustering and correlation. The COVID-19 virus has infected several nations, and K-means automatically looks for undiscovered clusters of those infections. To examine the spread of COVID-19 before a vaccine becomes widely available, this work has used unsupervised approaches to identify the crucial county-level confirmed cases, death cases, recover cases, total_cases_per_million, and total_deaths_per_million aspects of county-level variables. We combined countries into significant clusters using this feature subspace to assist more in-depth disease analysis efforts. As a result, we used a clustering technique to examine various trends in COVID-19 incidence and mortality across nations. This technique took the key components of a trajectory and incorporates them into a K-means clustering process. We separated the trend lines into measures that characterize various features of a trend. The measurements were first reduced in dimension, then clustered using a K-means algorithm. This method was used to individually calculate the incidence and death rates and then compare them.展开更多
In this paper,we introduce a novel Multi-scale and Auto-tuned Semi-supervised Deep Subspace Clustering(MAS-DSC)algorithm,aimed at addressing the challenges of deep subspace clustering in high-dimensional real-world da...In this paper,we introduce a novel Multi-scale and Auto-tuned Semi-supervised Deep Subspace Clustering(MAS-DSC)algorithm,aimed at addressing the challenges of deep subspace clustering in high-dimensional real-world data,particularly in the field of medical imaging.Traditional deep subspace clustering algorithms,which are mostly unsupervised,are limited in their ability to effectively utilize the inherent prior knowledge in medical images.Our MAS-DSC algorithm incorporates a semi-supervised learning framework that uses a small amount of labeled data to guide the clustering process,thereby enhancing the discriminative power of the feature representations.Additionally,the multi-scale feature extraction mechanism is designed to adapt to the complexity of medical imaging data,resulting in more accurate clustering performance.To address the difficulty of hyperparameter selection in deep subspace clustering,this paper employs a Bayesian optimization algorithm for adaptive tuning of hyperparameters related to subspace clustering,prior knowledge constraints,and model loss weights.Extensive experiments on standard clustering datasets,including ORL,Coil20,and Coil100,validate the effectiveness of the MAS-DSC algorithm.The results show that with its multi-scale network structure and Bayesian hyperparameter optimization,MAS-DSC achieves excellent clustering results on these datasets.Furthermore,tests on a brain tumor dataset demonstrate the robustness of the algorithm and its ability to leverage prior knowledge for efficient feature extraction and enhanced clustering performance within a semi-supervised learning framework.展开更多
The analysis of microstates in EEG signals is a crucial technique for understanding the spatiotemporal dynamics of brain electrical activity.Traditional methods such as Atomic Agglomerative Hierarchical Clustering(AAH...The analysis of microstates in EEG signals is a crucial technique for understanding the spatiotemporal dynamics of brain electrical activity.Traditional methods such as Atomic Agglomerative Hierarchical Clustering(AAHC),K-means clustering,Principal Component Analysis(PCA),and Independent Component Analysis(ICA)are limited by a fixed number of microstate maps and insufficient capability in cross-task feature extraction.Tackling these limitations,this study introduces a Global Map Dissimilarity(GMD)-driven density canopy K-means clustering algorithm.This innovative approach autonomously determines the optimal number of EEG microstate topographies and employs Gaussian kernel density estimation alongside the GMD index for dynamic modeling of EEG data.Utilizing this advanced algorithm,the study analyzes the Motor Imagery(MI)dataset from the GigaScience database,GigaDB.The findings reveal six distinct microstates during actual right-hand movement and five microstates across other task conditions,with microstate C showing superior performance in all task states.During imagined movement,microstate A was significantly enhanced.Comparison with existing algorithms indicates a significant improvement in clustering performance by the refined method,with an average Calinski-Harabasz Index(CHI)of 35517.29 and a Davis-Bouldin Index(DBI)average of 2.57.Furthermore,an information-theoretical analysis of the microstate sequences suggests that imagined movement exhibits higher complexity and disorder than actual movement.By utilizing the extracted microstate sequence parameters as features,the improved algorithm achieved a classification accuracy of 98.41%in EEG signal categorization for motor imagery.A performance of 78.183%accuracy was achieved in a four-class motor imagery task on the BCI-IV-2a dataset.These results demonstrate the potential of the advanced algorithm in microstate analysis,offering a more effective tool for a deeper understanding of the spatiotemporal features of EEG signals.展开更多
BACKGROUND Vessels encapsulating tumor clusters(VETC)represent a recently discovered vascular pattern associated with novel metastasis mechanisms in hepatocellular carcinoma(HCC).However,it seems that no one have focu...BACKGROUND Vessels encapsulating tumor clusters(VETC)represent a recently discovered vascular pattern associated with novel metastasis mechanisms in hepatocellular carcinoma(HCC).However,it seems that no one have focused on predicting VETC status in small HCC(sHCC).This study aimed to develop a new nomogram for predicting VETC positivity using preoperative clinical data and image features in sHCC(≤3 cm)patients.AIM To construct a nomogram that combines preoperative clinical parameters and image features to predict patterns of VETC and evaluate the prognosis of sHCC patients.METHODS A total of 309 patients with sHCC,who underwent segmental resection and had their VETC status confirmed,were included in the study.These patients were recruited from three different hospitals:Hospital 1 contributed 177 patients for the training set,Hospital 2 provided 78 patients for the test set,and Hospital 3 provided 54 patients for the validation set.Independent predictors of VETC were identified through univariate and multivariate logistic analyses.These independent predictors were then used to construct a VETC prediction model for sHCC.The model’s performance was evaluated using the area under the curve(AUC),calibration curve,and clinical decision curve.Additionally,Kaplan-Meier survival analysis was performed to confirm whether the predicted VETC status by the model is associated with early recurrence,just as it is with the actual VETC status and early recurrence.RESULTS Alpha-fetoprotein_lg10,carbohydrate antigen 199,irregular shape,non-smooth margin,and arterial peritumoral enhancement were identified as independent predictors of VETC.The model incorporating these predictors demonstrated strong predictive performance.The AUC was 0.811 for the training set,0.800 for the test set,and 0.791 for the validation set.The calibration curve indicated that the predicted probability was consistent with the actual VETC status in all three sets.Furthermore,the decision curve analysis demonstrated the clinical benefits of our model for patients with sHCC.Finally,early recurrence was more likely to occur in the VETC-positive group compared to the VETC-negative group,regardless of whether considering the actual or predicted VETC status.CONCLUSION Our novel prediction model demonstrates strong performance in predicting VETC positivity in sHCC(≤3 cm)patients,and it holds potential for predicting early recurrence.This model equips clinicians with valuable information to make informed clinical treatment decisions.展开更多
Due to the limitation and hesitation in one's knowledge, the membership degree of an element to a given set usually has a few different values, in which the conventional fuzzy sets are invalid. Hesitant fuzzy sets ar...Due to the limitation and hesitation in one's knowledge, the membership degree of an element to a given set usually has a few different values, in which the conventional fuzzy sets are invalid. Hesitant fuzzy sets are a powerful tool to treat this case. The present paper focuses on investigating the clustering technique for hesitant fuzzy sets based on the K-means clustering algorithm which takes the results of hierarchical clustering as the initial clusters. Finally, two examples demonstrate the validity of our algorithm.展开更多
A fast and effective fuzzy clustering algorithm is proposed. The algorithm splits an image into n × n blocks, and uses block variance to judge whether the block region is homogeneous. Mean and center pixel of eac...A fast and effective fuzzy clustering algorithm is proposed. The algorithm splits an image into n × n blocks, and uses block variance to judge whether the block region is homogeneous. Mean and center pixel of each homogeneous block are extracted for feature. Each inhomogeneous block is split into separate pixels and the mean of neighboring pixels within a window around each pixel and pixel value are extracted for feature. Then cluster of homogeneous blocks and cluster of separate pixels from inhomogeneous blocks are carried out respectively according to different membership functions. In fuzzy clustering stage, the center pixel and center number of the initial clustering are calculated based on histogram by using mean feature. Then different membership functions according to comparative result of block variance are computed. Finally, modified fuzzy c-means with spatial information to complete image segmentation axe used. Experimental results show that the proposed method can achieve better segmental results and has shorter executive time than many well-known methods.展开更多
The similarity measure is crucial to the performance of spectral clustering. The Gaussian kernel function based on the Euclidean distance is usual y adopted as the similarity measure. However, the Euclidean distance m...The similarity measure is crucial to the performance of spectral clustering. The Gaussian kernel function based on the Euclidean distance is usual y adopted as the similarity measure. However, the Euclidean distance measure cannot ful y reveal the complex distribution data, and the result of spectral clustering is very sensitive to the scaling parameter. To solve these problems, a new manifold distance measure and a novel simulated anneal-ing spectral clustering (SASC) algorithm based on the manifold distance measure are proposed. The simulated annealing based on genetic algorithm (SAGA), characterized by its rapid convergence to the global optimum, is used to cluster the sample points in the spectral mapping space. The proposed algorithm can not only reflect local and global consistency better, but also reduce the sensitivity of spectral clustering to the kernel parameter, which improves the algorithm’s clustering performance. To efficiently apply the algorithm to image segmentation, the Nystrom method is used to reduce the computation complexity. Experimental results show that compared with traditional clustering algorithms and those popular spectral clustering algorithms, the proposed algorithm can achieve better clustering performances on several synthetic datasets, texture images and real images.展开更多
Blind separation of sparse sources (BSSS) is discussed. The BSSS method based on the conventional K-means clustering is very fast and is also easy to implement. However, the accuracy of this method is generally not ...Blind separation of sparse sources (BSSS) is discussed. The BSSS method based on the conventional K-means clustering is very fast and is also easy to implement. However, the accuracy of this method is generally not satisfactory. The contribution of the vector x(t) with different modules is theoretically proved to be unequal, and a weighted K-means clustering method is proposed on this grounds. The proposed algorithm is not only as fast as the conventional K-means clustering method, but can also achieve considerably accurate results, which is demonstrated by numerical experiments.展开更多
The goal of this study was to optimize the constitutive parameters of foundation soils using a k-means algorithm with clustering analysis. A database was collected from unconfined compression tests, Proctor tests and ...The goal of this study was to optimize the constitutive parameters of foundation soils using a k-means algorithm with clustering analysis. A database was collected from unconfined compression tests, Proctor tests and grain distribution tests of soils taken from three different types of foundation pits: raft foundations, partial raft foundations and strip foundations. k-means algorithm with clustering analysis was applied to determine the most appropriate foundation type given the un- confined compression strengths and other parameters of the different soils.展开更多
Tarq geochemical 1:100,000 Sheet is located in Isfahan province which is investigated by Iran’s Geological and Explorations Organization using stream sediment analyzes. This area has stratigraphy of Precambrian to Qu...Tarq geochemical 1:100,000 Sheet is located in Isfahan province which is investigated by Iran’s Geological and Explorations Organization using stream sediment analyzes. This area has stratigraphy of Precambrian to Quaternary rocks and is located in the Central Iran zone. According to the presence of signs of gold mineralization in this area, it is necessary to identify important mineral areas in this area. Therefore, finding information is necessary about the relationship and monitoring the elements of gold, arsenic, and antimony relative to each other in this area to determine the extent of geochemical halos and to estimate the grade. Therefore, a well-known and useful K-means method is used for monitoring the elements in the present study, this is a clustering method based on minimizing the total Euclidean distances of each sample from the center of the classes which are assigned to them. In this research, the clustering quality function and the utility rate of the sample have been used in the desired cluster (S(i)) to determine the optimum number of clusters. Finally, with regard to the cluster centers and the results, the equations were used to predict the amount of the gold element based on four parameters of arsenic and antimony grade, length and width of sampling points.展开更多
The classification of the Northeast China Cold Vortex(NCCV)activity paths is an important way to analyze its characteristics in detail.Based on the daily precipitation data of the northeastern China(NEC)region,and the...The classification of the Northeast China Cold Vortex(NCCV)activity paths is an important way to analyze its characteristics in detail.Based on the daily precipitation data of the northeastern China(NEC)region,and the atmospheric circulation field and temperature field data of ERA-Interim for every six hours,the NCCV processes during the early summer(June)seasons from 1979 to 2018 were objectively identified.Then,the NCCV processes were classified using a machine learning method(k-means)according to the characteristic parameters of the activity path information.The rationality of the classification results was verified from two aspects,as follows:(1)the atmospheric circulation configuration of the NCCV on various paths;and(2)its influences on the climate conditions in the NEC.The obtained results showed that the activity paths of the NCCV could be divided into four types according to such characteristics as the generation origin,movement direction,and movement velocity of the NCCV.These included the generation-eastward movement type in the east of the Mongolia Plateau(eastward movement type or type A);generation-southeast longdistance movement type in the upstream of the Lena River(southeast long-distance movement type or type B);generationeastward less-movement type near Lake Baikal(eastward less-movement type or type C);and the generation-southward less-movement type in eastern Siberia(southward less-movement type or type D).There were obvious differences observed in the atmospheric circulation configuration and the climate impact of the NCCV on the four above-mentioned types of paths,which indicated that the classification results were reasonable.展开更多
文摘The k-means algorithm is a popular data clustering technique due to its speed and simplicity. However, it is susceptible to issues such as sensitivity to the chosen seeds, and inaccurate clusters due to poor initial seeds, particularly in complex datasets or datasets with non-spherical clusters. In this paper, a Comprehensive K-Means Clustering algorithm is presented, in which multiple trials of k-means are performed on a given dataset. The clustering results from each trial are transformed into a five-dimensional data point, containing the scope values of the x and y coordinates of the clusters along with the number of points within that cluster. A graph is then generated displaying the configuration of these points using Principal Component Analysis (PCA), from which we can observe and determine the common clustering patterns in the dataset. The robustness and strength of these patterns are then examined by observing the variance of the results of each trial, wherein a different subset of the data keeping a certain percentage of original data points is clustered. By aggregating information from multiple trials, we can distinguish clusters that consistently emerge across different runs from those that are more sensitive or unlikely, hence deriving more reliable conclusions about the underlying structure of complex datasets. Our experiments show that our algorithm is able to find the most common associations between different dimensions of data over multiple trials, often more accurately than other algorithms, as well as measure stability of these clusters, an ability that other k-means algorithms lack.
基金supported partly by the Open Project of State Key Laboratory of Millimeter Wave under Grant K202218partly by Innovation and Entrepreneurship Training Program of College Students under Grants 202210700006Y and 202210700005Z.
文摘As a mainstream research direction in the field of image segmentation,medical image segmentation plays a key role in the quantification of lesions,three-dimensional reconstruction,region of interest extraction and so on.Compared with natural images,medical images have a variety of modes.Besides,the emphasis of information which is conveyed by images of different modes is quite different.Because it is time-consuming and inefficient to manually segment medical images only by professional and experienced doctors.Therefore,large quantities of automated medical image segmentation methods have been developed.However,until now,researchers have not developed a universal method for all types of medical image segmentation.This paper reviews the literature on segmentation techniques that have produced major breakthroughs in recent years.Among the large quantities of medical image segmentation methods,this paper mainly discusses two categories of medical image segmentation methods.One is the improved strategies based on traditional clustering method.The other is the research progress of the improved image segmentation network structure model based on U-Net.The power of technology proves that the performance of the deep learning-based method is significantly better than that of the traditional method.This paper discussed both advantages and disadvantages of different algorithms and detailed how these methods can be used for the segmentation of lesions or other organs and tissues,as well as possible technical trends for future work.
基金supported by the National Magnetic Confinement Fusion Science Program of China(Nos.2018YFE0301104 and 2018YFE0301100)National Natural Science Foundation of China(Nos.12075096 and 51821005)。
文摘Various types of plasma events emerge in specific parameter ranges and exhibit similar characteristics in diagnostic signals,which can be applied to identify these events.A semisupervised machine learning algorithm,the k-means clustering algorithm,is utilized to investigate and identify plasma events in the J-TEXT plasma.This method can cluster diverse plasma events with homogeneous features,and then these events can be identified if given few manually labeled examples based on physical understanding.A survey of clustered events reveals that the k-means algorithm can make plasma events(rotating tearing mode,sawtooth oscillations,and locked mode)gathering in Euclidean space composed of multi-dimensional diagnostic data,like soft x-ray emission intensity,edge toroidal rotation velocity,the Mirnov signal amplitude and so on.Based on the cluster analysis results,an approximate analytical model is proposed to rapidly identify plasma events in the J-TEXT plasma.The cluster analysis method is conducive to data markers of massive diagnostic data.
文摘Computer Tomography in medical imaging provides human internal body pictures in the digital form. The more quality images it provides, the better information we get. Normally, medical imaging can be constructed by projection data from several perspectives. In this paper, our research challenges and describes a numerical method for refining the image of a Region of Interest (ROI) by constructing support within a standard CT image. It is obvious that the quality of tomographic slice is affected by artifacts. CT using filter and K-means clustering provides a way to reconstruct an ROI with minimal artifacts and improve the degree of the spatial resolution. Experimental results are presented for improving the reconstructed images, showing that the approach enhances the overall resolution and contrast of ROI images. Our method provides a number of advantages: robustness with noise in projection data and support construction without the need to acquire any additional setup.
基金funded by the National Key Research and Development Program(Grant No.2018YFC0807804-2)。
文摘Reservoir classification is a key link in reservoir evaluation.However,traditional manual means are inefficient,subjective,and classification standards are not uniform.Therefore,taking the Mishrif Formation of the Western Iraq as an example,a new reservoir classification and discrimination method is established by using the K-means clustering method and the Bayesian discrimination method.These methods are applied to non-cored wells to calculate the discrimination accuracy of the reservoir type,and thus the main reasons for low accuracy of reservoir discrimination are clarified.The results show that the discrimination accuracy of reservoir type based on K-means clustering and Bayesian stepwise discrimination is strongly related to the accuracy of the core data.The discrimination accuracy rate of TypeⅠ,TypeⅡ,and TypeⅤreservoirs is found to be significantly higher than that of TypeⅢand TypeⅣreservoirs using the method of combining K-means clustering and Bayesian theory based on logging data.Although the recognition accuracy of the new methodology for the TypeⅣreservoir is low,with average accuracy the new method has reached more than 82%in the entire study area,which lays a good foundation for rapid and accurate discrimination of reservoir types and the fine evaluation of a reservoir.
基金supported by Princess Nourah bint Abdulrahman University Researchers Supporting Project number(PNURSP2023R104)Princess Nourah bint Abdulrahman University,Riyadh,Saudi Arabia.
文摘Several pests feed on leaves,stems,bases,and the entire plant,causing plant illnesses.As a result,it is vital to identify and eliminate the disease before causing any damage to plants.Manually detecting plant disease and treating it is pretty challenging in this period.Image processing is employed to detect plant disease since it requires much effort and an extended processing period.The main goal of this study is to discover the disease that affects the plants by creating an image processing system that can recognize and classify four different forms of plant diseases,including Phytophthora infestans,Fusarium graminearum,Puccinia graminis,tomato yellow leaf curl.Therefore,this work uses the Support vector machine(SVM)classifier to detect and classify the plant disease using various steps like image acquisition,Pre-processing,Segmentation,feature extraction,and classification.The gray level co-occurrence matrix(GLCM)and the local binary pattern features(LBP)are used to identify the disease-affected portion of the plant leaf.According to experimental data,the proposed technology can correctly detect and diagnose plant sickness with a 97.2 percent accuracy.
基金This work was supported in part by the National Natural Science Foundation of China under Grant 62171203in part by the Suzhou Key Supporting Subjects[Health Informatics(No.SZFCXK202147)]+2 种基金in part by the Changshu Science and Technology Program[No.CS202015,CS202246]in part by the Changshu City Health and Health Committee Science and Technology Program[No.csws201913]in part by the“333 High Level Personnel Training Project of Jiangsu Province”.
文摘In recent years,the soft subspace clustering algorithm has shown good results for high-dimensional data,which can assign different weights to each cluster class and use weights to measure the contribution of each dimension in various features.The enhanced soft subspace clustering algorithm combines interclass separation and intraclass tightness information,which has strong results for image segmentation,but the clustering algorithm is vulnerable to noisy data and dependence on the initialized clustering center.However,the clustering algorithmis susceptible to the influence of noisydata and reliance on initializedclustering centers andfalls into a local optimum;the clustering effect is poor for brain MR images with unclear boundaries and noise effects.To address these problems,a soft subspace clustering algorithm for brain MR images based on genetic algorithm optimization is proposed,which combines the generalized noise technique,relaxes the equational weight constraint in the objective function as the boundary constraint,and uses a genetic algorithm as a method to optimize the initialized clustering center.The genetic algorithm finds the best clustering center and reduces the algorithm’s dependence on the initial clustering center.The experiment verifies the robustness of the algorithm,as well as the noise immunity in various ways and shows good results on the common dataset and the brain MR images provided by the Changshu First People’s Hospital with specific high accuracy for clinical medicine.
文摘The COVID-19 pandemic has caused an unprecedented spike in confirmed cases in 230 countries globally. In this work, a set of data from the COVID-19 coronavirus outbreak has been subjected to two well-known unsupervised learning techniques: K-means clustering and correlation. The COVID-19 virus has infected several nations, and K-means automatically looks for undiscovered clusters of those infections. To examine the spread of COVID-19 before a vaccine becomes widely available, this work has used unsupervised approaches to identify the crucial county-level confirmed cases, death cases, recover cases, total_cases_per_million, and total_deaths_per_million aspects of county-level variables. We combined countries into significant clusters using this feature subspace to assist more in-depth disease analysis efforts. As a result, we used a clustering technique to examine various trends in COVID-19 incidence and mortality across nations. This technique took the key components of a trajectory and incorporates them into a K-means clustering process. We separated the trend lines into measures that characterize various features of a trend. The measurements were first reduced in dimension, then clustered using a K-means algorithm. This method was used to individually calculate the incidence and death rates and then compare them.
基金supported in part by the National Natural Science Foundation of China under Grant 62171203in part by the Jiangsu Province“333 Project”High-Level Talent Cultivation Subsidized Project+2 种基金in part by the SuzhouKey Supporting Subjects for Health Informatics under Grant SZFCXK202147in part by the Changshu Science and Technology Program under Grants CS202015 and CS202246in part by Changshu Key Laboratory of Medical Artificial Intelligence and Big Data under Grants CYZ202301 and CS202314.
文摘In this paper,we introduce a novel Multi-scale and Auto-tuned Semi-supervised Deep Subspace Clustering(MAS-DSC)algorithm,aimed at addressing the challenges of deep subspace clustering in high-dimensional real-world data,particularly in the field of medical imaging.Traditional deep subspace clustering algorithms,which are mostly unsupervised,are limited in their ability to effectively utilize the inherent prior knowledge in medical images.Our MAS-DSC algorithm incorporates a semi-supervised learning framework that uses a small amount of labeled data to guide the clustering process,thereby enhancing the discriminative power of the feature representations.Additionally,the multi-scale feature extraction mechanism is designed to adapt to the complexity of medical imaging data,resulting in more accurate clustering performance.To address the difficulty of hyperparameter selection in deep subspace clustering,this paper employs a Bayesian optimization algorithm for adaptive tuning of hyperparameters related to subspace clustering,prior knowledge constraints,and model loss weights.Extensive experiments on standard clustering datasets,including ORL,Coil20,and Coil100,validate the effectiveness of the MAS-DSC algorithm.The results show that with its multi-scale network structure and Bayesian hyperparameter optimization,MAS-DSC achieves excellent clustering results on these datasets.Furthermore,tests on a brain tumor dataset demonstrate the robustness of the algorithm and its ability to leverage prior knowledge for efficient feature extraction and enhanced clustering performance within a semi-supervised learning framework.
基金funded by National Nature Science Foundation of China,Yunnan Funda-Mental Research Projects,Special Project of Guangdong Province in Key Fields of Ordinary Colleges and Universities and Chaozhou Science and Technology Plan Project of Funder Grant Numbers 82060329,202201AT070108,2023ZDZX2038 and 202201GY01.
文摘The analysis of microstates in EEG signals is a crucial technique for understanding the spatiotemporal dynamics of brain electrical activity.Traditional methods such as Atomic Agglomerative Hierarchical Clustering(AAHC),K-means clustering,Principal Component Analysis(PCA),and Independent Component Analysis(ICA)are limited by a fixed number of microstate maps and insufficient capability in cross-task feature extraction.Tackling these limitations,this study introduces a Global Map Dissimilarity(GMD)-driven density canopy K-means clustering algorithm.This innovative approach autonomously determines the optimal number of EEG microstate topographies and employs Gaussian kernel density estimation alongside the GMD index for dynamic modeling of EEG data.Utilizing this advanced algorithm,the study analyzes the Motor Imagery(MI)dataset from the GigaScience database,GigaDB.The findings reveal six distinct microstates during actual right-hand movement and five microstates across other task conditions,with microstate C showing superior performance in all task states.During imagined movement,microstate A was significantly enhanced.Comparison with existing algorithms indicates a significant improvement in clustering performance by the refined method,with an average Calinski-Harabasz Index(CHI)of 35517.29 and a Davis-Bouldin Index(DBI)average of 2.57.Furthermore,an information-theoretical analysis of the microstate sequences suggests that imagined movement exhibits higher complexity and disorder than actual movement.By utilizing the extracted microstate sequence parameters as features,the improved algorithm achieved a classification accuracy of 98.41%in EEG signal categorization for motor imagery.A performance of 78.183%accuracy was achieved in a four-class motor imagery task on the BCI-IV-2a dataset.These results demonstrate the potential of the advanced algorithm in microstate analysis,offering a more effective tool for a deeper understanding of the spatiotemporal features of EEG signals.
基金Supported by the Project of Shanghai Municipal Commission of Health,No.2022LJ024.
文摘BACKGROUND Vessels encapsulating tumor clusters(VETC)represent a recently discovered vascular pattern associated with novel metastasis mechanisms in hepatocellular carcinoma(HCC).However,it seems that no one have focused on predicting VETC status in small HCC(sHCC).This study aimed to develop a new nomogram for predicting VETC positivity using preoperative clinical data and image features in sHCC(≤3 cm)patients.AIM To construct a nomogram that combines preoperative clinical parameters and image features to predict patterns of VETC and evaluate the prognosis of sHCC patients.METHODS A total of 309 patients with sHCC,who underwent segmental resection and had their VETC status confirmed,were included in the study.These patients were recruited from three different hospitals:Hospital 1 contributed 177 patients for the training set,Hospital 2 provided 78 patients for the test set,and Hospital 3 provided 54 patients for the validation set.Independent predictors of VETC were identified through univariate and multivariate logistic analyses.These independent predictors were then used to construct a VETC prediction model for sHCC.The model’s performance was evaluated using the area under the curve(AUC),calibration curve,and clinical decision curve.Additionally,Kaplan-Meier survival analysis was performed to confirm whether the predicted VETC status by the model is associated with early recurrence,just as it is with the actual VETC status and early recurrence.RESULTS Alpha-fetoprotein_lg10,carbohydrate antigen 199,irregular shape,non-smooth margin,and arterial peritumoral enhancement were identified as independent predictors of VETC.The model incorporating these predictors demonstrated strong predictive performance.The AUC was 0.811 for the training set,0.800 for the test set,and 0.791 for the validation set.The calibration curve indicated that the predicted probability was consistent with the actual VETC status in all three sets.Furthermore,the decision curve analysis demonstrated the clinical benefits of our model for patients with sHCC.Finally,early recurrence was more likely to occur in the VETC-positive group compared to the VETC-negative group,regardless of whether considering the actual or predicted VETC status.CONCLUSION Our novel prediction model demonstrates strong performance in predicting VETC positivity in sHCC(≤3 cm)patients,and it holds potential for predicting early recurrence.This model equips clinicians with valuable information to make informed clinical treatment decisions.
基金Supported by the National Natural Science Foundation of China(61273209)
文摘Due to the limitation and hesitation in one's knowledge, the membership degree of an element to a given set usually has a few different values, in which the conventional fuzzy sets are invalid. Hesitant fuzzy sets are a powerful tool to treat this case. The present paper focuses on investigating the clustering technique for hesitant fuzzy sets based on the K-means clustering algorithm which takes the results of hierarchical clustering as the initial clusters. Finally, two examples demonstrate the validity of our algorithm.
文摘A fast and effective fuzzy clustering algorithm is proposed. The algorithm splits an image into n × n blocks, and uses block variance to judge whether the block region is homogeneous. Mean and center pixel of each homogeneous block are extracted for feature. Each inhomogeneous block is split into separate pixels and the mean of neighboring pixels within a window around each pixel and pixel value are extracted for feature. Then cluster of homogeneous blocks and cluster of separate pixels from inhomogeneous blocks are carried out respectively according to different membership functions. In fuzzy clustering stage, the center pixel and center number of the initial clustering are calculated based on histogram by using mean feature. Then different membership functions according to comparative result of block variance are computed. Finally, modified fuzzy c-means with spatial information to complete image segmentation axe used. Experimental results show that the proposed method can achieve better segmental results and has shorter executive time than many well-known methods.
基金supported by the National Natural Science Foundationof China(61272119)
文摘The similarity measure is crucial to the performance of spectral clustering. The Gaussian kernel function based on the Euclidean distance is usual y adopted as the similarity measure. However, the Euclidean distance measure cannot ful y reveal the complex distribution data, and the result of spectral clustering is very sensitive to the scaling parameter. To solve these problems, a new manifold distance measure and a novel simulated anneal-ing spectral clustering (SASC) algorithm based on the manifold distance measure are proposed. The simulated annealing based on genetic algorithm (SAGA), characterized by its rapid convergence to the global optimum, is used to cluster the sample points in the spectral mapping space. The proposed algorithm can not only reflect local and global consistency better, but also reduce the sensitivity of spectral clustering to the kernel parameter, which improves the algorithm’s clustering performance. To efficiently apply the algorithm to image segmentation, the Nystrom method is used to reduce the computation complexity. Experimental results show that compared with traditional clustering algorithms and those popular spectral clustering algorithms, the proposed algorithm can achieve better clustering performances on several synthetic datasets, texture images and real images.
基金the National Natural Science Foundation of China (60672061)
文摘Blind separation of sparse sources (BSSS) is discussed. The BSSS method based on the conventional K-means clustering is very fast and is also easy to implement. However, the accuracy of this method is generally not satisfactory. The contribution of the vector x(t) with different modules is theoretically proved to be unequal, and a weighted K-means clustering method is proposed on this grounds. The proposed algorithm is not only as fast as the conventional K-means clustering method, but can also achieve considerably accurate results, which is demonstrated by numerical experiments.
文摘The goal of this study was to optimize the constitutive parameters of foundation soils using a k-means algorithm with clustering analysis. A database was collected from unconfined compression tests, Proctor tests and grain distribution tests of soils taken from three different types of foundation pits: raft foundations, partial raft foundations and strip foundations. k-means algorithm with clustering analysis was applied to determine the most appropriate foundation type given the un- confined compression strengths and other parameters of the different soils.
文摘Tarq geochemical 1:100,000 Sheet is located in Isfahan province which is investigated by Iran’s Geological and Explorations Organization using stream sediment analyzes. This area has stratigraphy of Precambrian to Quaternary rocks and is located in the Central Iran zone. According to the presence of signs of gold mineralization in this area, it is necessary to identify important mineral areas in this area. Therefore, finding information is necessary about the relationship and monitoring the elements of gold, arsenic, and antimony relative to each other in this area to determine the extent of geochemical halos and to estimate the grade. Therefore, a well-known and useful K-means method is used for monitoring the elements in the present study, this is a clustering method based on minimizing the total Euclidean distances of each sample from the center of the classes which are assigned to them. In this research, the clustering quality function and the utility rate of the sample have been used in the desired cluster (S(i)) to determine the optimum number of clusters. Finally, with regard to the cluster centers and the results, the equations were used to predict the amount of the gold element based on four parameters of arsenic and antimony grade, length and width of sampling points.
基金This research was jointly supported by the National Natural Science Foundation of China(Grant No.42005037)the Liaoning Provincial Natural Science Foundation Project(PhD Start-up Research Fund 2019-BS-214),the Special Scientific Research Project for the Forecaster(Grant No.CMAYBY2018-018)+2 种基金a Key Technical Project of Liaoning Meteorological Bureau(Grant No.LNGJ201903)the National Key Research and Development Project(Grant No.2018YFC1505601)the Open Foundation Project of the Institute of Atmospheric Environment,China Meteorological Administration(Grant Nos.2020SYIAE08 and 2020SYIAEZD5).
文摘The classification of the Northeast China Cold Vortex(NCCV)activity paths is an important way to analyze its characteristics in detail.Based on the daily precipitation data of the northeastern China(NEC)region,and the atmospheric circulation field and temperature field data of ERA-Interim for every six hours,the NCCV processes during the early summer(June)seasons from 1979 to 2018 were objectively identified.Then,the NCCV processes were classified using a machine learning method(k-means)according to the characteristic parameters of the activity path information.The rationality of the classification results was verified from two aspects,as follows:(1)the atmospheric circulation configuration of the NCCV on various paths;and(2)its influences on the climate conditions in the NEC.The obtained results showed that the activity paths of the NCCV could be divided into four types according to such characteristics as the generation origin,movement direction,and movement velocity of the NCCV.These included the generation-eastward movement type in the east of the Mongolia Plateau(eastward movement type or type A);generation-southeast longdistance movement type in the upstream of the Lena River(southeast long-distance movement type or type B);generationeastward less-movement type near Lake Baikal(eastward less-movement type or type C);and the generation-southward less-movement type in eastern Siberia(southward less-movement type or type D).There were obvious differences observed in the atmospheric circulation configuration and the climate impact of the NCCV on the four above-mentioned types of paths,which indicated that the classification results were reasonable.