Currently, the selection of receiving traces in geometry design is mostly based on the horizontal layered medium hypothesis, which is unable to meet survey requirements in a complex area. This paper estimates the opti...Currently, the selection of receiving traces in geometry design is mostly based on the horizontal layered medium hypothesis, which is unable to meet survey requirements in a complex area. This paper estimates the optimal number of receiving traces in field geometry using a numerical simulation based on a field test conducted in previous research (Zhu et al., 2011). A mathematical model is established for total energy and average efficiency energy using fixed trace spacing and optimal receiving traces are estimated. Seismic data acquired in a complex work area are used to verify the correctness of the proposed method. Results of model data calculations and actual data processing show that results are in agreement. This indicates that the proposed method is reasonable, correct, sufficiently scientific, and can be regarded as a novel method for use in seismic geometry design in complex geological regions.展开更多
An appropriate optimal number of market segments(ONS)estimation is essential for an enterprise to achieve successful market segmentation,but at present,there is a serious lack of attention to this issue in market segm...An appropriate optimal number of market segments(ONS)estimation is essential for an enterprise to achieve successful market segmentation,but at present,there is a serious lack of attention to this issue in market segmentation.In our study,an independent adaptive ONS estimation method BWCON-NSDK-means++is proposed by integrating a newinternal validity index(IVI)Between-Within-Connectivity(BWCON)and a newstable clustering algorithmNatural-SDK-means++(NSDK-means++)in a novel way.First,to complete the evaluation dimensions of the existing IVIs,we designed a connectivity formula based on the neighbor relationship and proposed the BWCON by integrating the connectivity with other two commonly considered measures of compactness and separation.Then,considering the stability,number of parameters and clustering performance,we proposed the NSDK-means++to participate in the integrationwhere the natural neighbor was used to optimize the initial cluster centers(ICCs)determination strategy in the SDK-means++.At last,to ensure the objectivity of the estimatedONS,we designed a BWCON-based ONS estimation framework that does not require the user to set any parameters in advance and integrated the NSDK-means++into this framework forming a practical ONS estimation tool BWCON-NSDK-means++.The final experimental results showthat the proposed BWCONand NSDK-means++are significantlymore suitable than their respective existing models to participate in the integration for determining theONS,and the proposed BWCON-NSDK-means++is demonstrably superior to the BWCON-KMA,BWCONMBK,BWCON-KM++,BWCON-RKM++,BWCON-SDKM++,BWCON-Single linkage,BWCON-Complete linkage,BWCON-Average linkage and BWCON-Ward linkage in terms of the ONS estimation.Moreover,as an independentmarket segmentation tool,the BWCON-NSDK-means++also outperforms the existing models with respect to the inter-market differentiation and sub-market size.展开更多
The upper bound of the optimal number of clusters in clustering algorithm is studied in this paper. A new method is proposed to solve this issue. This method shows that the rule cmax≤N^(1/N), which is popular in curr...The upper bound of the optimal number of clusters in clustering algorithm is studied in this paper. A new method is proposed to solve this issue. This method shows that the rule cmax≤N^(1/N), which is popular in current papers, is reasonable in some sense. The above conclusion is tested and analyzed by some typical examples in the literature, which demonstrates the validity of the new method.展开更多
Synthesis of chemical processes is of non-convex and multi-modal. Deterministic strategies often fail to find global optimum within reasonable time scales. Stochastic methodologies generally approach global solution i...Synthesis of chemical processes is of non-convex and multi-modal. Deterministic strategies often fail to find global optimum within reasonable time scales. Stochastic methodologies generally approach global solution in probability. In recogniting the state of art status in the discipline, a new approach for global optimization of processes, based on sequential number theoretic optimization (SNTO), is proposed. In this approach, subspaces and feasible points are derived from uniformly scattered points, and iterations over passing the corner of local optimum are enhanced via parallel strategy. The efficiency of the approach proposed is verified by results obtained from various case studies.展开更多
Traditional Fuzzy C-Means(FCM)and Possibilistic C-Means(PCM)clustering algorithms are data-driven,and their objective function minimization process is based on the available numeric data.Recently,knowledge hints have ...Traditional Fuzzy C-Means(FCM)and Possibilistic C-Means(PCM)clustering algorithms are data-driven,and their objective function minimization process is based on the available numeric data.Recently,knowledge hints have been introduced to formknowledge-driven clustering algorithms,which reveal a data structure that considers not only the relationships between data but also the compatibility with knowledge hints.However,these algorithms cannot produce the optimal number of clusters by the clustering algorithm itself;they require the assistance of evaluation indices.Moreover,knowledge hints are usually used as part of the data structure(directly replacing some clustering centers),which severely limits the flexibility of the algorithm and can lead to knowledgemisguidance.To solve this problem,this study designs a newknowledge-driven clustering algorithmcalled the PCM clusteringwith High-density Points(HP-PCM),in which domain knowledge is represented in the form of so-called high-density points.First,a newdatadensitycalculation function is proposed.The Density Knowledge Points Extraction(DKPE)method is established to filter out high-density points from the dataset to form knowledge hints.Then,these hints are incorporated into the PCM objective function so that the clustering algorithm is guided by high-density points to discover the natural data structure.Finally,the initial number of clusters is set to be greater than the true one based on the number of knowledge hints.Then,the HP-PCM algorithm automatically determines the final number of clusters during the clustering process by considering the cluster elimination mechanism.Through experimental studies,including some comparative analyses,the results highlight the effectiveness of the proposed algorithm,such as the increased success rate in clustering,the ability to determine the optimal cluster number,and the faster convergence speed.展开更多
Refined 3D modeling of mine slopes is pivotal for precise prediction of geological hazards.Aiming at the inadequacy of existing single modeling methods in comprehensively representing the overall and localized charact...Refined 3D modeling of mine slopes is pivotal for precise prediction of geological hazards.Aiming at the inadequacy of existing single modeling methods in comprehensively representing the overall and localized characteristics of mining slopes,this study introduces a new method that fuses model data from Unmanned aerial vehicles(UAV)tilt photogrammetry and 3D laser scanning through a data alignment algorithm based on control points.First,the mini batch K-Medoids algorithm is utilized to cluster the point cloud data from ground 3D laser scanning.Then,the elbow rule is applied to determine the optimal cluster number(K0),and the feature points are extracted.Next,the nearest neighbor point algorithm is employed to match the feature points obtained from UAV tilt photogrammetry,and the internal point coordinates are adjusted through the distanceweighted average to construct a 3D model.Finally,by integrating an engineering case study,the K0 value is determined to be 8,with a matching accuracy between the two model datasets ranging from 0.0669 to 1.0373 mm.Therefore,compared with the modeling method utilizing K-medoids clustering algorithm,the new modeling method significantly enhances the computational efficiency,the accuracy of selecting the optimal number of feature points in 3D laser scanning,and the precision of the 3D model derived from UAV tilt photogrammetry.This method provides a research foundation for constructing mine slope model.展开更多
Clustering is a group of unsupervised statistical techniques commonly used in many disciplines. Considering their applications to fish abundance data, many technical details need to be considered to ensure reasonable ...Clustering is a group of unsupervised statistical techniques commonly used in many disciplines. Considering their applications to fish abundance data, many technical details need to be considered to ensure reasonable interpretation. However, the reliability and stability of the clustering methods have rarely been studied in the contexts of fisheries. This study presents an intensive evaluation of three common clustering methods, including hierarchical clustering(HC), K-means(KM), and expectation-maximization(EM) methods, based on fish community surveys in the coastal waters of Shandong, China. We evaluated the performances of these three methods considering different numbers of clusters, data size, and data transformation approaches, focusing on the consistency validation using the index of average proportion of non-overlap(APN). The results indicate that the three methods tend to be inconsistent in the optimal number of clusters. EM showed relatively better performances to avoid unbalanced classification, whereas HC and KM provided more stable clustering results. Data transformation including scaling, square-root, and log-transformation had substantial influences on the clustering results, especially for KM. Moreover, transformation also influenced clustering stability, wherein scaling tended to provide a stable solution at the same number of clusters. The APN values indicated improved stability with increasing data size, and the effect leveled off over 70 samples in general and most quickly in EM. We conclude that the best clustering method can be chosen depending on the aim of the study and the number of clusters. In general, KM is relatively robust in our tests. We also provide recommendations for future application of clustering analyses. This study is helpful to ensure the credibility of the application and interpretation of clustering methods.展开更多
We propose a novel scheme based on clustering analysis in color space to solve text segmentation in complex color images. Text segmentation includes automatic clustering of color space and foreground image generation....We propose a novel scheme based on clustering analysis in color space to solve text segmentation in complex color images. Text segmentation includes automatic clustering of color space and foreground image generation. Two methods are also proposed for automatic clustering: The first one is to determine the optimal number of clusters and the second one is the fuzzy competitively clustering method based on competitively learning techniques. Essential foreground images obtained from any of the color clusters are combined into foreground images. Further performance analysis reveals the advantages of the proposed methods.展开更多
Network intrusion poses a severe threat to the Internet.However,existing intrusion detection models cannot effectively distinguish different intrusions with high-degree feature overlap.In addition,efficient real-time ...Network intrusion poses a severe threat to the Internet.However,existing intrusion detection models cannot effectively distinguish different intrusions with high-degree feature overlap.In addition,efficient real-time detection is an urgent problem.To address the two above problems,we propose a Latent Dirichlet Allocation topic model-based framework for real-time network Intrusion Detection(LDA-ID),consisting of static and online LDA-ID.The problem of feature overlap is transformed into static LDA-ID topic number optimization and topic selection.Thus,the detection is based on the latent topic features.To achieve efficient real-time detection,we design an online computing mode for static LDA-ID,in which a parameter iteration method based on momentum is proposed to balance the contribution of prior knowledge and new information.Furthermore,we design two matching mechanisms to accommodate the static and online LDA-ID,respectively.Experimental results on the public NSL-KDD and UNSW-NB15 datasets show that our framework gets higher accuracy than the others.展开更多
Clustering data with varying densities and complicated structures is important,while many existing clustering algorithms face difficulties for this problem. The reason is that varying densities and complicated structu...Clustering data with varying densities and complicated structures is important,while many existing clustering algorithms face difficulties for this problem. The reason is that varying densities and complicated structure make single algorithms perform badly for different parts of data. More intensive parts are assumed to have more information probably,an algorithm clustering from high density part is proposed,which begins from a tiny distance to find the highest density-connected partition and form corresponding super cores,then distance is iteratively increased by a global heuristic method to cluster parts with different densities. Mean of silhouette coefficient indicates the cluster performance. Denoising function is implemented to eliminate influence of noise and outliers. Many challenging experiments indicate that the algorithm has good performance on data with widely varying densities and extremely complex structures. It decides the optimal number of clusters automatically.Background knowledge is not needed and parameters tuning is easy. It is robust against noise and outliers.展开更多
It is proposed that double level programming technique may be adopted in synthesis strategy. Optimization of heat exchanger network structural configuration (the master problem) may be solved at the upper level, leavi...It is proposed that double level programming technique may be adopted in synthesis strategy. Optimization of heat exchanger network structural configuration (the master problem) may be solved at the upper level, leaving the rest operating conditions( the slave problem) being optimized at the lower level. With the uniqueness in mind, an HEN synthesis expert system may be employed to address both the logical constraints and the global operation parameters′ optimization using enhanced sequential number optimization theory.Case studies demonstrate that the synthesis strategy proposed can effectively simplify both the problem solving and the synthesis process. The validity of the strategy recommended is evidenced by case studies′ results compared.展开更多
Aimed at the great computing complexity of optimal brain surgeon (OBS) process, a pruning algorithm with penalty OBS process is presented. Compared with sensitive and regularized methods, the penalty OBS algorithm not...Aimed at the great computing complexity of optimal brain surgeon (OBS) process, a pruning algorithm with penalty OBS process is presented. Compared with sensitive and regularized methods, the penalty OBS algorithm not only avoids time-consuming defect and low pruning efficiency in OBS process, but also keeps higher generalization and pruning accuracy than Levenberg-Marquardt method.展开更多
In this paper,interval number optimization and model predictive control are proposed to handle the uncertain-but-bounded parameters in electric water heater load scheduling.First of all,interval numbers are used to de...In this paper,interval number optimization and model predictive control are proposed to handle the uncertain-but-bounded parameters in electric water heater load scheduling.First of all,interval numbers are used to describe uncertain parameters including hot water demand,ambient temperature,and real-time price of electricity.Moreover,the traditional thermal dynamic model of electric water heater is transformed into an interval number model,based on which,the day-ahead load scheduling problem with uncertain parameters is formulated,and solved by interval number optimization.Different tolerance degrees for constraint violation and temperature preferences are also discussed for giving consumers more choices.Furthermore,the model predictive control which incorporates both forecasts and newly updated information is utilized to make and execute electric water heater load schedules on a rolling basis throughout the day.Simulation results demonstrate that interval number optimization either in day-ahead optimization or model predictive control format is robust to the uncertain hot water demand,ambient temperature,and real-time price of electricity,enabling customers to flexibly adjust electric water heater control strategy.展开更多
In this paper,the dynamic control approaches for spectrum sensing are proposed,based on the theory that prediction is synonymous with data compression in computational learning.Firstly,a spectrum sensing sequence pred...In this paper,the dynamic control approaches for spectrum sensing are proposed,based on the theory that prediction is synonymous with data compression in computational learning.Firstly,a spectrum sensing sequence prediction scheme is proposed to reduce the spectrum sensing time and improve the throughput of secondary users.We use Ziv-Lempel data compression algorithm to design the prediction scheme,where spectrum band usage history is utilized.In addition,an iterative algorithm to find out the optimal number of spectrum bands allowed to sense is proposed,with the aim of maximizing the expected net reward of each secondary user in each time slot.Finally,extensive simulation results are shown to demonstrate the effectiveness of the proposed dynamic control approaches of spectrum sensing.展开更多
Purpose-The most commonly used approaches for cluster validation are based on indices but the majority of the existing cluster validity indices do not work well on data sets of different complexities.The purpose of th...Purpose-The most commonly used approaches for cluster validation are based on indices but the majority of the existing cluster validity indices do not work well on data sets of different complexities.The purpose of this paper is to propose a new cluster validity index(ARSD index)that works well on all types of data sets.Design/methodology/approach-The authors introduce a new compactness measure that depicts the typical behaviour of a cluster where more points are located around the centre and lesser points towards the outer edge of the cluster.A novel penalty function is proposed for determining the distinctness measure of clusters.Random linear search-algorithm is employed to evaluate and compare the performance of the five commonly known validity indices and the proposed validity index.The values of the six indices are computed for all nc ranging from(nc_(min),nc_(max))to obtain the optimal number of clusters present in a data set.The data sets used in the experiments include shaped,Gaussian-like and real data sets.Findings-Through extensive experimental study,it is observed that the proposed validity index is found to be more consistent and reliable in indicating the correct number of clusters compared to other validity indices.This is experimentally demonstrated on 11 data sets where the proposed index has achieved better results.Originality/value-The originality of the research paper includes proposing a novel cluster validity index which is used to determine the optimal number of clusters present in data sets of different complexities.展开更多
In a declining market for goods,we optimize the net profit in business when inventory management allows change in the selling prices n times over time horizon.We are computing optimal number of changes in prices,respe...In a declining market for goods,we optimize the net profit in business when inventory management allows change in the selling prices n times over time horizon.We are computing optimal number of changes in prices,respective optimal prices,and optimal profit in each of the cycle for a deteriorating product.This paper theoretically proves that for any business setup there exists an optimal number of price settings for obtaining maximum profit.Theoretical results are supported by numerical examples for different setups(data set)and it is found that for every setup the dynamic pricing policy out-performs the static pricing policy.In our model,the deterioration factor has been taken into consideration.The deteriorated units are determined by the recurrence method.Also we studied the effect of different parameters on optimal policy with simulation.For managerial purposes,we have provided some“suggested intervals”for choosing parameters depending upon initial demand,which help to predict the best prices and arrival of customers(demand).展开更多
The change in the maize moisture content during different growth stages is an important indicator to evaluate the growth status of maize.In particular,the moisture content during the grain-filling stage reflects the g...The change in the maize moisture content during different growth stages is an important indicator to evaluate the growth status of maize.In particular,the moisture content during the grain-filling stage reflects the grain quality and maturity and it can also be used as an important indicator for breeding and seed selection.At present,the drying method is usually used to calculate the moisture content and the dehydration rate at the grain-filling stage,however,it requires large sample size and long test time.In order to monitor the change in the moisture content at the maize grain-filling stage using small sample set,the Bootstrap re-sampling strategy-sample set partitioning based on joint x-y distances-partial least squares(Bootstrap-SPXY-PLS)moisture content monitoring model and near-infrared spectroscopy for small sample sizes of 10,20,and 50 were used.To improve the prediction accuracy of the model,the optimal number of factors of the model was determined and the comprehensive evaluation thresholds RVP(coefficient of determination(R^(2)),the root mean square error of cross-validation(RMSECV)and the root mean square error of prediction(RMSEP))was proposed for sub-model screening.The model exhibited a good performance for predicting the moisture content of the maize grain at the filling stage for small sample set.For the sample sizes of 20 and 50,the R^(2) values were greater than 0.99.The average deviations of the predicted and reference values of the model were 0.1078%,0.057%,and 0.0918%,respectively.Therefore,the model was effective for monitoring the moisture content at the grain-filling stage for a small sample size.The method is also suitable for the quantitative analysis of different concentrations using near-infrared spectroscopy and small sample size.展开更多
基金supported by the National Natural Science Foundation of China(No.41304115)National Key S&T Special Projects(No.2016ZX050 24001-003)+2 种基金Open Fund for Sichuan Province Key Laboratory of Natural Gas Geology(No.2015trqdz02)the Research Project,CNPC(No.2016A-33)"Young and Middle-aged Key Teachers"Training Program in Southwest Petroleum University
文摘Currently, the selection of receiving traces in geometry design is mostly based on the horizontal layered medium hypothesis, which is unable to meet survey requirements in a complex area. This paper estimates the optimal number of receiving traces in field geometry using a numerical simulation based on a field test conducted in previous research (Zhu et al., 2011). A mathematical model is established for total energy and average efficiency energy using fixed trace spacing and optimal receiving traces are estimated. Seismic data acquired in a complex work area are used to verify the correctness of the proposed method. Results of model data calculations and actual data processing show that results are in agreement. This indicates that the proposed method is reasonable, correct, sufficiently scientific, and can be regarded as a novel method for use in seismic geometry design in complex geological regions.
基金supported by the earmarked fund for CARS-29 and the open funds of the Key Laboratory of Viticulture and Enology,Ministry of Agriculture,China.
文摘An appropriate optimal number of market segments(ONS)estimation is essential for an enterprise to achieve successful market segmentation,but at present,there is a serious lack of attention to this issue in market segmentation.In our study,an independent adaptive ONS estimation method BWCON-NSDK-means++is proposed by integrating a newinternal validity index(IVI)Between-Within-Connectivity(BWCON)and a newstable clustering algorithmNatural-SDK-means++(NSDK-means++)in a novel way.First,to complete the evaluation dimensions of the existing IVIs,we designed a connectivity formula based on the neighbor relationship and proposed the BWCON by integrating the connectivity with other two commonly considered measures of compactness and separation.Then,considering the stability,number of parameters and clustering performance,we proposed the NSDK-means++to participate in the integrationwhere the natural neighbor was used to optimize the initial cluster centers(ICCs)determination strategy in the SDK-means++.At last,to ensure the objectivity of the estimatedONS,we designed a BWCON-based ONS estimation framework that does not require the user to set any parameters in advance and integrated the NSDK-means++into this framework forming a practical ONS estimation tool BWCON-NSDK-means++.The final experimental results showthat the proposed BWCONand NSDK-means++are significantlymore suitable than their respective existing models to participate in the integration for determining theONS,and the proposed BWCON-NSDK-means++is demonstrably superior to the BWCON-KMA,BWCONMBK,BWCON-KM++,BWCON-RKM++,BWCON-SDKM++,BWCON-Single linkage,BWCON-Complete linkage,BWCON-Average linkage and BWCON-Ward linkage in terms of the ONS estimation.Moreover,as an independentmarket segmentation tool,the BWCON-NSDK-means++also outperforms the existing models with respect to the inter-market differentiation and sub-market size.
基金This work was supported by the National Natural Science Foundation of China (Grant Nos. 69872003 and 40035010)
文摘The upper bound of the optimal number of clusters in clustering algorithm is studied in this paper. A new method is proposed to solve this issue. This method shows that the rule cmax≤N^(1/N), which is popular in current papers, is reasonable in some sense. The above conclusion is tested and analyzed by some typical examples in the literature, which demonstrates the validity of the new method.
文摘Synthesis of chemical processes is of non-convex and multi-modal. Deterministic strategies often fail to find global optimum within reasonable time scales. Stochastic methodologies generally approach global solution in probability. In recogniting the state of art status in the discipline, a new approach for global optimization of processes, based on sequential number theoretic optimization (SNTO), is proposed. In this approach, subspaces and feasible points are derived from uniformly scattered points, and iterations over passing the corner of local optimum are enhanced via parallel strategy. The efficiency of the approach proposed is verified by results obtained from various case studies.
基金supported by the National Key Research and Development Program of China(No.2022YFB3304400)the National Natural Science Foundation of China(Nos.6230311,62303111,62076060,61932007,and 62176083)the Key Research and Development Program of Jiangsu Province of China(No.BE2022157).
文摘Traditional Fuzzy C-Means(FCM)and Possibilistic C-Means(PCM)clustering algorithms are data-driven,and their objective function minimization process is based on the available numeric data.Recently,knowledge hints have been introduced to formknowledge-driven clustering algorithms,which reveal a data structure that considers not only the relationships between data but also the compatibility with knowledge hints.However,these algorithms cannot produce the optimal number of clusters by the clustering algorithm itself;they require the assistance of evaluation indices.Moreover,knowledge hints are usually used as part of the data structure(directly replacing some clustering centers),which severely limits the flexibility of the algorithm and can lead to knowledgemisguidance.To solve this problem,this study designs a newknowledge-driven clustering algorithmcalled the PCM clusteringwith High-density Points(HP-PCM),in which domain knowledge is represented in the form of so-called high-density points.First,a newdatadensitycalculation function is proposed.The Density Knowledge Points Extraction(DKPE)method is established to filter out high-density points from the dataset to form knowledge hints.Then,these hints are incorporated into the PCM objective function so that the clustering algorithm is guided by high-density points to discover the natural data structure.Finally,the initial number of clusters is set to be greater than the true one based on the number of knowledge hints.Then,the HP-PCM algorithm automatically determines the final number of clusters during the clustering process by considering the cluster elimination mechanism.Through experimental studies,including some comparative analyses,the results highlight the effectiveness of the proposed algorithm,such as the increased success rate in clustering,the ability to determine the optimal cluster number,and the faster convergence speed.
基金funded by National Natural Science Foundation of China(Grant Nos.42272333,42277147).
文摘Refined 3D modeling of mine slopes is pivotal for precise prediction of geological hazards.Aiming at the inadequacy of existing single modeling methods in comprehensively representing the overall and localized characteristics of mining slopes,this study introduces a new method that fuses model data from Unmanned aerial vehicles(UAV)tilt photogrammetry and 3D laser scanning through a data alignment algorithm based on control points.First,the mini batch K-Medoids algorithm is utilized to cluster the point cloud data from ground 3D laser scanning.Then,the elbow rule is applied to determine the optimal cluster number(K0),and the feature points are extracted.Next,the nearest neighbor point algorithm is employed to match the feature points obtained from UAV tilt photogrammetry,and the internal point coordinates are adjusted through the distanceweighted average to construct a 3D model.Finally,by integrating an engineering case study,the K0 value is determined to be 8,with a matching accuracy between the two model datasets ranging from 0.0669 to 1.0373 mm.Therefore,compared with the modeling method utilizing K-medoids clustering algorithm,the new modeling method significantly enhances the computational efficiency,the accuracy of selecting the optimal number of feature points in 3D laser scanning,and the precision of the 3D model derived from UAV tilt photogrammetry.This method provides a research foundation for constructing mine slope model.
基金provided by the Marine S&T Fund of Shandong Province for Pilot National Laboratory for Marine Science and Technology (Qingdao) (No.2018SDKJ0501-2)。
文摘Clustering is a group of unsupervised statistical techniques commonly used in many disciplines. Considering their applications to fish abundance data, many technical details need to be considered to ensure reasonable interpretation. However, the reliability and stability of the clustering methods have rarely been studied in the contexts of fisheries. This study presents an intensive evaluation of three common clustering methods, including hierarchical clustering(HC), K-means(KM), and expectation-maximization(EM) methods, based on fish community surveys in the coastal waters of Shandong, China. We evaluated the performances of these three methods considering different numbers of clusters, data size, and data transformation approaches, focusing on the consistency validation using the index of average proportion of non-overlap(APN). The results indicate that the three methods tend to be inconsistent in the optimal number of clusters. EM showed relatively better performances to avoid unbalanced classification, whereas HC and KM provided more stable clustering results. Data transformation including scaling, square-root, and log-transformation had substantial influences on the clustering results, especially for KM. Moreover, transformation also influenced clustering stability, wherein scaling tended to provide a stable solution at the same number of clusters. The APN values indicated improved stability with increasing data size, and the effect leveled off over 70 samples in general and most quickly in EM. We conclude that the best clustering method can be chosen depending on the aim of the study and the number of clusters. In general, KM is relatively robust in our tests. We also provide recommendations for future application of clustering analyses. This study is helpful to ensure the credibility of the application and interpretation of clustering methods.
文摘We propose a novel scheme based on clustering analysis in color space to solve text segmentation in complex color images. Text segmentation includes automatic clustering of color space and foreground image generation. Two methods are also proposed for automatic clustering: The first one is to determine the optimal number of clusters and the second one is the fuzzy competitively clustering method based on competitively learning techniques. Essential foreground images obtained from any of the color clusters are combined into foreground images. Further performance analysis reveals the advantages of the proposed methods.
基金supported by the National Natural Science Foundation of China(Grant No.U1636208,No.61862008,No.61902013)the Beihang Youth Top Talent Support Program(Grant No.YWF-21-BJJ-1039)。
文摘Network intrusion poses a severe threat to the Internet.However,existing intrusion detection models cannot effectively distinguish different intrusions with high-degree feature overlap.In addition,efficient real-time detection is an urgent problem.To address the two above problems,we propose a Latent Dirichlet Allocation topic model-based framework for real-time network Intrusion Detection(LDA-ID),consisting of static and online LDA-ID.The problem of feature overlap is transformed into static LDA-ID topic number optimization and topic selection.Thus,the detection is based on the latent topic features.To achieve efficient real-time detection,we design an online computing mode for static LDA-ID,in which a parameter iteration method based on momentum is proposed to balance the contribution of prior knowledge and new information.Furthermore,we design two matching mechanisms to accommodate the static and online LDA-ID,respectively.Experimental results on the public NSL-KDD and UNSW-NB15 datasets show that our framework gets higher accuracy than the others.
基金Supported by the National Key Research and Development Program of China(No.2016YFB0201305)National Science and Technology Major Project(No.2013ZX0102-8001-001-001)National Natural Science Foundation of China(No.91430218,31327901,61472395,61272134,61432018)
文摘Clustering data with varying densities and complicated structures is important,while many existing clustering algorithms face difficulties for this problem. The reason is that varying densities and complicated structure make single algorithms perform badly for different parts of data. More intensive parts are assumed to have more information probably,an algorithm clustering from high density part is proposed,which begins from a tiny distance to find the highest density-connected partition and form corresponding super cores,then distance is iteratively increased by a global heuristic method to cluster parts with different densities. Mean of silhouette coefficient indicates the cluster performance. Denoising function is implemented to eliminate influence of noise and outliers. Many challenging experiments indicate that the algorithm has good performance on data with widely varying densities and extremely complex structures. It decides the optimal number of clusters automatically.Background knowledge is not needed and parameters tuning is easy. It is robust against noise and outliers.
文摘It is proposed that double level programming technique may be adopted in synthesis strategy. Optimization of heat exchanger network structural configuration (the master problem) may be solved at the upper level, leaving the rest operating conditions( the slave problem) being optimized at the lower level. With the uniqueness in mind, an HEN synthesis expert system may be employed to address both the logical constraints and the global operation parameters′ optimization using enhanced sequential number optimization theory.Case studies demonstrate that the synthesis strategy proposed can effectively simplify both the problem solving and the synthesis process. The validity of the strategy recommended is evidenced by case studies′ results compared.
文摘Aimed at the great computing complexity of optimal brain surgeon (OBS) process, a pruning algorithm with penalty OBS process is presented. Compared with sensitive and regularized methods, the penalty OBS algorithm not only avoids time-consuming defect and low pruning efficiency in OBS process, but also keeps higher generalization and pruning accuracy than Levenberg-Marquardt method.
基金This work was supported by the National Natural Science Foundation of China(Grant No.51477111)the National Key Research and Development Program of China(Grant No.2016 YFB-0901102).
文摘In this paper,interval number optimization and model predictive control are proposed to handle the uncertain-but-bounded parameters in electric water heater load scheduling.First of all,interval numbers are used to describe uncertain parameters including hot water demand,ambient temperature,and real-time price of electricity.Moreover,the traditional thermal dynamic model of electric water heater is transformed into an interval number model,based on which,the day-ahead load scheduling problem with uncertain parameters is formulated,and solved by interval number optimization.Different tolerance degrees for constraint violation and temperature preferences are also discussed for giving consumers more choices.Furthermore,the model predictive control which incorporates both forecasts and newly updated information is utilized to make and execute electric water heater load schedules on a rolling basis throughout the day.Simulation results demonstrate that interval number optimization either in day-ahead optimization or model predictive control format is robust to the uncertain hot water demand,ambient temperature,and real-time price of electricity,enabling customers to flexibly adjust electric water heater control strategy.
基金sponsored by the National Natural Science Foundation of China (60832009)Beijing National Sciences Foundation (4102044)+1 种基金the Hi-Tech Research and Development Program of China (2009AA01Z211, 2009AA01Z246)the Fundamental Research Funds for the Central Universities (BUPT2009RC0119)
文摘In this paper,the dynamic control approaches for spectrum sensing are proposed,based on the theory that prediction is synonymous with data compression in computational learning.Firstly,a spectrum sensing sequence prediction scheme is proposed to reduce the spectrum sensing time and improve the throughput of secondary users.We use Ziv-Lempel data compression algorithm to design the prediction scheme,where spectrum band usage history is utilized.In addition,an iterative algorithm to find out the optimal number of spectrum bands allowed to sense is proposed,with the aim of maximizing the expected net reward of each secondary user in each time slot.Finally,extensive simulation results are shown to demonstrate the effectiveness of the proposed dynamic control approaches of spectrum sensing.
文摘Purpose-The most commonly used approaches for cluster validation are based on indices but the majority of the existing cluster validity indices do not work well on data sets of different complexities.The purpose of this paper is to propose a new cluster validity index(ARSD index)that works well on all types of data sets.Design/methodology/approach-The authors introduce a new compactness measure that depicts the typical behaviour of a cluster where more points are located around the centre and lesser points towards the outer edge of the cluster.A novel penalty function is proposed for determining the distinctness measure of clusters.Random linear search-algorithm is employed to evaluate and compare the performance of the five commonly known validity indices and the proposed validity index.The values of the six indices are computed for all nc ranging from(nc_(min),nc_(max))to obtain the optimal number of clusters present in a data set.The data sets used in the experiments include shaped,Gaussian-like and real data sets.Findings-Through extensive experimental study,it is observed that the proposed validity index is found to be more consistent and reliable in indicating the correct number of clusters compared to other validity indices.This is experimentally demonstrated on 11 data sets where the proposed index has achieved better results.Originality/value-The originality of the research paper includes proposing a novel cluster validity index which is used to determine the optimal number of clusters present in data sets of different complexities.
文摘In a declining market for goods,we optimize the net profit in business when inventory management allows change in the selling prices n times over time horizon.We are computing optimal number of changes in prices,respective optimal prices,and optimal profit in each of the cycle for a deteriorating product.This paper theoretically proves that for any business setup there exists an optimal number of price settings for obtaining maximum profit.Theoretical results are supported by numerical examples for different setups(data set)and it is found that for every setup the dynamic pricing policy out-performs the static pricing policy.In our model,the deterioration factor has been taken into consideration.The deteriorated units are determined by the recurrence method.Also we studied the effect of different parameters on optimal policy with simulation.For managerial purposes,we have provided some“suggested intervals”for choosing parameters depending upon initial demand,which help to predict the best prices and arrival of customers(demand).
基金This work was financially supported by the grant from the International Cooperation and Exchange of the National Natural Science Foundation of China(No.31811540396),Chinathe National Natural Science Foundation of China(No.31701318),Chinathe Training Project of Heilongjiang Bayi Agricultural University,China(No.XZR2016-09).
文摘The change in the maize moisture content during different growth stages is an important indicator to evaluate the growth status of maize.In particular,the moisture content during the grain-filling stage reflects the grain quality and maturity and it can also be used as an important indicator for breeding and seed selection.At present,the drying method is usually used to calculate the moisture content and the dehydration rate at the grain-filling stage,however,it requires large sample size and long test time.In order to monitor the change in the moisture content at the maize grain-filling stage using small sample set,the Bootstrap re-sampling strategy-sample set partitioning based on joint x-y distances-partial least squares(Bootstrap-SPXY-PLS)moisture content monitoring model and near-infrared spectroscopy for small sample sizes of 10,20,and 50 were used.To improve the prediction accuracy of the model,the optimal number of factors of the model was determined and the comprehensive evaluation thresholds RVP(coefficient of determination(R^(2)),the root mean square error of cross-validation(RMSECV)and the root mean square error of prediction(RMSEP))was proposed for sub-model screening.The model exhibited a good performance for predicting the moisture content of the maize grain at the filling stage for small sample set.For the sample sizes of 20 and 50,the R^(2) values were greater than 0.99.The average deviations of the predicted and reference values of the model were 0.1078%,0.057%,and 0.0918%,respectively.Therefore,the model was effective for monitoring the moisture content at the grain-filling stage for a small sample size.The method is also suitable for the quantitative analysis of different concentrations using near-infrared spectroscopy and small sample size.