Search-based software engineering has mainly dealt with automated test data generation by metaheuristic search techniques. Similarly, we try to generate the test data (i.e., problem instances) which show the worst cas...Search-based software engineering has mainly dealt with automated test data generation by metaheuristic search techniques. Similarly, we try to generate the test data (i.e., problem instances) which show the worst case of algorithms by such a technique. In this paper, in terms of non-functional testing, we re-define the worst case of some algorithms, respectively. By using genetic algorithms (GAs), we illustrate the strategies corresponding to each type of instances. We here adopt three problems for examples;the sorting problem, the 0/1 knapsack problem (0/1KP), and the travelling salesperson problem (TSP). In some algorithms solving these problems, we could find the worst-case instances successfully;the successfulness of the result is based on a statistical approach and comparison to the results by using the random testing. Our tried examples introduce informative guidelines to the use of genetic algorithms in generating the worst-case instance, which is defined in the aspect of algorithm performance.展开更多
The composition of base oils affects the performance of lubricants made from them.This paper proposes a hybrid model based on gradient-boosted decision tree(GBDT)to analyze the effect of different ratios of KN4010,PAO...The composition of base oils affects the performance of lubricants made from them.This paper proposes a hybrid model based on gradient-boosted decision tree(GBDT)to analyze the effect of different ratios of KN4010,PAO40,and PriEco3000 component in a composite base oil system on the performance of lubricants.The study was conducted under small laboratory sample conditions,and a data expansion method using the Gaussian Copula function was proposed to improve the prediction ability of the hybrid model.The study also compared four optimization algorithms,sticky mushroom algorithm(SMA),genetic algorithm(GA),whale optimization algorithm(WOA),and seagull optimization algorithm(SOA),to predict the kinematic viscosity at 40℃,kinematic viscosity at 100℃,viscosity index,and oxidation induction time performance of the lubricant.The results showed that the Gaussian Copula function data expansion method improved the prediction ability of the hybrid model in the case of small samples.The SOA-GBDT hybrid model had the fastest convergence speed for the samples and the best prediction effect,with determination coefficients(R^(2))for the four indicators of lubricants reaching 0.98,0.99,0.96 and 0.96,respectively.Thus,this model can significantly reduce the model’s prediction error and has good prediction ability.展开更多
To aim at the problem that the horizontal directivity index of the vector hy- drophone vertical array is not higher than that of a vector hydrophone, the high-resolution azimuth estimation algorithm based on the data ...To aim at the problem that the horizontal directivity index of the vector hy- drophone vertical array is not higher than that of a vector hydrophone, the high-resolution azimuth estimation algorithm based on the data fusion method was presented. The proposed algorithnl first employs MUSIC algorithm to estimate the azimuth of each divided sub-band signal, and then the estimated azimuths of multiple hydrophones are processed by using the data fusion technique. The high-resolution estimated result is achieved finally by adopting the weighted histogram statistics method. The results of the simulation and sea trials indicated that the proposed algorithm has better azimuth estimation performance than MUSIC algorithm of a single vector hydrophone and the data fusion technique based on the acoustic energy flux method. The better performance is reflected in the aspects of the estimation precision, the probability of correct estimation, the capability to distinguish multi-objects and the inhibition of the noise sub-bands.展开更多
Fuzzy c-means(FCM) clustering algorithm is sensitive to noise points and outlier data, and the possibilistic fuzzy c-means(PFCM) clustering algorithm overcomes the problem well, but PFCM clustering algorithm has some ...Fuzzy c-means(FCM) clustering algorithm is sensitive to noise points and outlier data, and the possibilistic fuzzy c-means(PFCM) clustering algorithm overcomes the problem well, but PFCM clustering algorithm has some problems: it is still sensitive to initial clustering centers and the clustering results are not good when the tested datasets with noise are very unequal. An improved kernel possibilistic fuzzy c-means algorithm based on invasive weed optimization(IWO-KPFCM) is proposed in this paper. This algorithm first uses invasive weed optimization(IWO) algorithm to seek the optimal solution as the initial clustering centers, and introduces kernel method to make the input data from the sample space map into the high-dimensional feature space. Then, the sample variance is introduced in the objection function to measure the compact degree of data. Finally, the improved algorithm is used to cluster data. The simulation results of the University of California-Irvine(UCI) data sets and artificial data sets show that the proposed algorithm has stronger ability to resist noise, higher cluster accuracy and faster convergence speed than the PFCM algorithm.展开更多
This paper first puts forward a case based system framework based on data mining techniques. Then the paper examines the possibility of using neural networks as a method of retrieval in such a case based system. In ...This paper first puts forward a case based system framework based on data mining techniques. Then the paper examines the possibility of using neural networks as a method of retrieval in such a case based system. In this system we propose data mining algorithms to discover case knowledge and other algorithms.展开更多
In this paper, we explore a novel ensemble method for spectral clustering. In contrast to the traditional clustering ensemble methods that combine all the obtained clustering results, we propose the adaptive spectral ...In this paper, we explore a novel ensemble method for spectral clustering. In contrast to the traditional clustering ensemble methods that combine all the obtained clustering results, we propose the adaptive spectral clustering ensemble method to achieve a better clustering solution. This method can adaptively assess the number of the component members, which is not owned by many other algorithms. The component clusterings of the ensemble system are generated by spectral clustering (SC) which bears some good characteristics to engender the diverse committees. The selection process works by evaluating the generated component spectral clustering through resampling technique and population-based incremental learning algorithm (PBIL). Experimental results on UCI datasets demonstrate that the proposed algorithm can achieve better results compared with traditional clustering ensemble methods, especially when the number of component clusterings is large.展开更多
Frequent itemset mining serves as the main method of association rule mining.With the limitations in computing space and performance,the association of frequent items in large data mining requires both extensive time ...Frequent itemset mining serves as the main method of association rule mining.With the limitations in computing space and performance,the association of frequent items in large data mining requires both extensive time and effort,particularly when the datasets become increasingly larger.In the process of associated data mining in a big data environment,the MapReduce programming model is typically used to perform task partitioning and parallel processing,which could improve the execution effciency of the algorithm.However,to ensure that the associated rule is not destroyed during task partitioning and parallel processing,the inner-relationship data must be stored in the computer space.Because inner-relationship data are redundant,storage of these data will significantly increase the space usage in comparison with the original dataset.In this study,we find that the formation of the frequent pattern(FP)mining algorithm depends mainly on the conditional pattern bases.Based on the parallel frequent pattern(PFP)algorithm theory,the grouping model divides frequent items into several groups according to their frequencies.We propose a non-group PFP(NG-PFP)mining algorithm that cancels the grouping model and reduces the data redundancy between sub-tasks.Moreover,we present the NG-PFP algorithm for task partition and parallel processing,and its performance in the Hadoop cluster environment is analyzed and discussed.Experimental results indicate that the non-group model shows obvious improvement in terms of computational effciency and the space utilization rate.展开更多
Instead of establishing mathematical hydraulic system models from physical laws usually done with the problems of complex modelling processes, low reliability and practicality caused by large uncertainties, a novel mo...Instead of establishing mathematical hydraulic system models from physical laws usually done with the problems of complex modelling processes, low reliability and practicality caused by large uncertainties, a novel modelling method for a highly nonlinear system of a hydraulic excavator is presented. Based on the data collected in the excavator's arms driving experiments, a data-based excavator dynamic model using Simplified Refined Instrumental Variable (SRIV) identification and estimation algorithms is established. The validity of the proposed data-based model is indirectly demonstrated by the performance of computer simulation and the.real machine motion control exoeriments.展开更多
针对出租车随意停靠造成城市交通拥堵甚至交通事故的问题,利用成都实际区域的出租车GPS(Global Position System)数据和爬取的POI(Point of Interest)数据,使用DBSCAN(Density-Based Spatial Clustering of Application with Noise)聚...针对出租车随意停靠造成城市交通拥堵甚至交通事故的问题,利用成都实际区域的出租车GPS(Global Position System)数据和爬取的POI(Point of Interest)数据,使用DBSCAN(Density-Based Spatial Clustering of Application with Noise)聚类算法对上下客点进行聚类,得到出租车的载客热点,根据POI的类型划定载客热点区域的类型,对出租车不同时间的出行需求进行分析,进而划分出出租车的固定停车区域。研究结果表明,出租车固定停车区域的设定与出行者的出行需求有关,即将固定停车区域设置在出行者出行需求多的区域,可以满足出行者的不同出行需求。结合出租车载客热点和爬取POI数据划定固定停车区域的方法具有较高的实用性,可为城市交通安全方面提供理论和现实意义。展开更多
城市POI的分布情况客观反映了一个城市各行各业的发展情况,传统获取POI的测绘手段成本高、更新周期长、时效性差,而基于位置的社交网络(Location-Based Social Network,LBSN)平台的发展为实现城市POI的感知提供了一种新思路。本文提出...城市POI的分布情况客观反映了一个城市各行各业的发展情况,传统获取POI的测绘手段成本高、更新周期长、时效性差,而基于位置的社交网络(Location-Based Social Network,LBSN)平台的发展为实现城市POI的感知提供了一种新思路。本文提出一种基于LBSN数据聚类分析的城市POI感知方法,首先,对LBSN数据进行预处理,包括清洗重复数据、删除无效数据、数据预分类等,以提高数据的有效性;其次,提出一种改进的DBSCAN算法,对处理后的数据进行聚类分析,从而得到准确度较高的城市各类POI分布情况。实验结果表明,与传统的DBSCAN算法以及K-means算法相比,本文提出的算法有更好的聚类效果,且在聚类指标上有更大的CH指数值和更小的DBI指数值。展开更多
基金Supported by National Natural Science Foundation of China (61304079, 61125306, 61034002), the Open Research Project from SKLMCCS (20120106), the Fundamental Research Funds for the Central Universities (FRF-TP-13-018A), and the China Postdoctoral Science. Foundation (201_3M_ 5305_27)_ _ _
文摘为有致动器浸透和未知动力学的分离时间的系统的一个班的一个新奇最佳的追踪控制方法在这份报纸被建议。计划基于反复的适应动态编程(自动数据处理) 算法。以便实现控制计划,一个 data-based 标识符首先为未知系统动力学被构造。由介绍 M 网络,稳定的控制的明确的公式被完成。以便消除致动器浸透的效果, nonquadratic 表演功能被介绍,然后一个反复的自动数据处理算法被建立与集中分析完成最佳的追踪控制解决方案。为实现最佳的控制方法,神经网络被用来建立 data-based 标识符,计算性能索引功能,近似最佳的控制政策并且分别地解决稳定的控制。模拟例子被提供验证介绍最佳的追踪的控制计划的有效性。
文摘Search-based software engineering has mainly dealt with automated test data generation by metaheuristic search techniques. Similarly, we try to generate the test data (i.e., problem instances) which show the worst case of algorithms by such a technique. In this paper, in terms of non-functional testing, we re-define the worst case of some algorithms, respectively. By using genetic algorithms (GAs), we illustrate the strategies corresponding to each type of instances. We here adopt three problems for examples;the sorting problem, the 0/1 knapsack problem (0/1KP), and the travelling salesperson problem (TSP). In some algorithms solving these problems, we could find the worst-case instances successfully;the successfulness of the result is based on a statistical approach and comparison to the results by using the random testing. Our tried examples introduce informative guidelines to the use of genetic algorithms in generating the worst-case instance, which is defined in the aspect of algorithm performance.
基金financial support extended for this academic work by the Beijing Natural Science Foundation(Grant 2232066)the Open Project Foundation of State Key Laboratory of Solid Lubrication(Grant LSL-2212).
文摘The composition of base oils affects the performance of lubricants made from them.This paper proposes a hybrid model based on gradient-boosted decision tree(GBDT)to analyze the effect of different ratios of KN4010,PAO40,and PriEco3000 component in a composite base oil system on the performance of lubricants.The study was conducted under small laboratory sample conditions,and a data expansion method using the Gaussian Copula function was proposed to improve the prediction ability of the hybrid model.The study also compared four optimization algorithms,sticky mushroom algorithm(SMA),genetic algorithm(GA),whale optimization algorithm(WOA),and seagull optimization algorithm(SOA),to predict the kinematic viscosity at 40℃,kinematic viscosity at 100℃,viscosity index,and oxidation induction time performance of the lubricant.The results showed that the Gaussian Copula function data expansion method improved the prediction ability of the hybrid model in the case of small samples.The SOA-GBDT hybrid model had the fastest convergence speed for the samples and the best prediction effect,with determination coefficients(R^(2))for the four indicators of lubricants reaching 0.98,0.99,0.96 and 0.96,respectively.Thus,this model can significantly reduce the model’s prediction error and has good prediction ability.
基金the leaders of the State Key Laboratory of Acoustics Institute of Acoustics,Chinese Academy of Sciences,for their project support
文摘To aim at the problem that the horizontal directivity index of the vector hy- drophone vertical array is not higher than that of a vector hydrophone, the high-resolution azimuth estimation algorithm based on the data fusion method was presented. The proposed algorithnl first employs MUSIC algorithm to estimate the azimuth of each divided sub-band signal, and then the estimated azimuths of multiple hydrophones are processed by using the data fusion technique. The high-resolution estimated result is achieved finally by adopting the weighted histogram statistics method. The results of the simulation and sea trials indicated that the proposed algorithm has better azimuth estimation performance than MUSIC algorithm of a single vector hydrophone and the data fusion technique based on the acoustic energy flux method. The better performance is reflected in the aspects of the estimation precision, the probability of correct estimation, the capability to distinguish multi-objects and the inhibition of the noise sub-bands.
文摘Fuzzy c-means(FCM) clustering algorithm is sensitive to noise points and outlier data, and the possibilistic fuzzy c-means(PFCM) clustering algorithm overcomes the problem well, but PFCM clustering algorithm has some problems: it is still sensitive to initial clustering centers and the clustering results are not good when the tested datasets with noise are very unequal. An improved kernel possibilistic fuzzy c-means algorithm based on invasive weed optimization(IWO-KPFCM) is proposed in this paper. This algorithm first uses invasive weed optimization(IWO) algorithm to seek the optimal solution as the initial clustering centers, and introduces kernel method to make the input data from the sample space map into the high-dimensional feature space. Then, the sample variance is introduced in the objection function to measure the compact degree of data. Finally, the improved algorithm is used to cluster data. The simulation results of the University of California-Irvine(UCI) data sets and artificial data sets show that the proposed algorithm has stronger ability to resist noise, higher cluster accuracy and faster convergence speed than the PFCM algorithm.
基金Supported by the National Science of China(6 0 0 75 0 15 ) and Key Project of Scientific and Technological Departmentin Anhui
文摘This paper first puts forward a case based system framework based on data mining techniques. Then the paper examines the possibility of using neural networks as a method of retrieval in such a case based system. In this system we propose data mining algorithms to discover case knowledge and other algorithms.
基金Supported by the National Natural Science Foundation of China (60661003)the Research Project Department of Education of Jiangxi Province (GJJ10566)
文摘In this paper, we explore a novel ensemble method for spectral clustering. In contrast to the traditional clustering ensemble methods that combine all the obtained clustering results, we propose the adaptive spectral clustering ensemble method to achieve a better clustering solution. This method can adaptively assess the number of the component members, which is not owned by many other algorithms. The component clusterings of the ensemble system are generated by spectral clustering (SC) which bears some good characteristics to engender the diverse committees. The selection process works by evaluating the generated component spectral clustering through resampling technique and population-based incremental learning algorithm (PBIL). Experimental results on UCI datasets demonstrate that the proposed algorithm can achieve better results compared with traditional clustering ensemble methods, especially when the number of component clusterings is large.
基金project supported by the Fundamental Research Funds for the Central Universities,China(No.2412015KJ005)the Twelfth Five-Year Plan of the Education Department of Jilin Province,China(No.557)the Thirteenth Five-Year Plan for Scientific Research of the Education Department of Jilin Province,China(No.JJKH20191197KJ)
文摘Frequent itemset mining serves as the main method of association rule mining.With the limitations in computing space and performance,the association of frequent items in large data mining requires both extensive time and effort,particularly when the datasets become increasingly larger.In the process of associated data mining in a big data environment,the MapReduce programming model is typically used to perform task partitioning and parallel processing,which could improve the execution effciency of the algorithm.However,to ensure that the associated rule is not destroyed during task partitioning and parallel processing,the inner-relationship data must be stored in the computer space.Because inner-relationship data are redundant,storage of these data will significantly increase the space usage in comparison with the original dataset.In this study,we find that the formation of the frequent pattern(FP)mining algorithm depends mainly on the conditional pattern bases.Based on the parallel frequent pattern(PFP)algorithm theory,the grouping model divides frequent items into several groups according to their frequencies.We propose a non-group PFP(NG-PFP)mining algorithm that cancels the grouping model and reduces the data redundancy between sub-tasks.Moreover,we present the NG-PFP algorithm for task partition and parallel processing,and its performance in the Hadoop cluster environment is analyzed and discussed.Experimental results indicate that the non-group model shows obvious improvement in terms of computational effciency and the space utilization rate.
文摘Instead of establishing mathematical hydraulic system models from physical laws usually done with the problems of complex modelling processes, low reliability and practicality caused by large uncertainties, a novel modelling method for a highly nonlinear system of a hydraulic excavator is presented. Based on the data collected in the excavator's arms driving experiments, a data-based excavator dynamic model using Simplified Refined Instrumental Variable (SRIV) identification and estimation algorithms is established. The validity of the proposed data-based model is indirectly demonstrated by the performance of computer simulation and the.real machine motion control exoeriments.
文摘针对出租车随意停靠造成城市交通拥堵甚至交通事故的问题,利用成都实际区域的出租车GPS(Global Position System)数据和爬取的POI(Point of Interest)数据,使用DBSCAN(Density-Based Spatial Clustering of Application with Noise)聚类算法对上下客点进行聚类,得到出租车的载客热点,根据POI的类型划定载客热点区域的类型,对出租车不同时间的出行需求进行分析,进而划分出出租车的固定停车区域。研究结果表明,出租车固定停车区域的设定与出行者的出行需求有关,即将固定停车区域设置在出行者出行需求多的区域,可以满足出行者的不同出行需求。结合出租车载客热点和爬取POI数据划定固定停车区域的方法具有较高的实用性,可为城市交通安全方面提供理论和现实意义。
文摘城市POI的分布情况客观反映了一个城市各行各业的发展情况,传统获取POI的测绘手段成本高、更新周期长、时效性差,而基于位置的社交网络(Location-Based Social Network,LBSN)平台的发展为实现城市POI的感知提供了一种新思路。本文提出一种基于LBSN数据聚类分析的城市POI感知方法,首先,对LBSN数据进行预处理,包括清洗重复数据、删除无效数据、数据预分类等,以提高数据的有效性;其次,提出一种改进的DBSCAN算法,对处理后的数据进行聚类分析,从而得到准确度较高的城市各类POI分布情况。实验结果表明,与传统的DBSCAN算法以及K-means算法相比,本文提出的算法有更好的聚类效果,且在聚类指标上有更大的CH指数值和更小的DBI指数值。