Compositional data, such as relative information, is a crucial aspect of machine learning and other related fields. It is typically recorded as closed data or sums to a constant, like 100%. The statistical linear mode...Compositional data, such as relative information, is a crucial aspect of machine learning and other related fields. It is typically recorded as closed data or sums to a constant, like 100%. The statistical linear model is the most used technique for identifying hidden relationships between underlying random variables of interest. However, data quality is a significant challenge in machine learning, especially when missing data is present. The linear regression model is a commonly used statistical modeling technique used in various applications to find relationships between variables of interest. When estimating linear regression parameters which are useful for things like future prediction and partial effects analysis of independent variables, maximum likelihood estimation (MLE) is the method of choice. However, many datasets contain missing observations, which can lead to costly and time-consuming data recovery. To address this issue, the expectation-maximization (EM) algorithm has been suggested as a solution for situations including missing data. The EM algorithm repeatedly finds the best estimates of parameters in statistical models that depend on variables or data that have not been observed. This is called maximum likelihood or maximum a posteriori (MAP). Using the present estimate as input, the expectation (E) step constructs a log-likelihood function. Finding the parameters that maximize the anticipated log-likelihood, as determined in the E step, is the job of the maximization (M) phase. This study looked at how well the EM algorithm worked on a made-up compositional dataset with missing observations. It used both the robust least square version and ordinary least square regression techniques. The efficacy of the EM algorithm was compared with two alternative imputation techniques, k-Nearest Neighbor (k-NN) and mean imputation (), in terms of Aitchison distances and covariance.展开更多
Emotion represents the feeling of an individual in a given situation. There are various ways to express the emotions of an individual. It can be categorized into verbal expressions, written expressions, facial express...Emotion represents the feeling of an individual in a given situation. There are various ways to express the emotions of an individual. It can be categorized into verbal expressions, written expressions, facial expressions and gestures. Among these various ways of expressing the emotion, the written method is a challenging task to extract the emotions, as the data is in the form of textual dat. Finding the different kinds of emotions is also a tedious task as it requires a lot of pre preparations of the textual data taken for the research. This research work is carried out to analyse and extract the emotions hidden in text data. The text data taken for the analysis is from the social media dataset. Using the raw text data directly from the social media will not serve the purpose. Therefore, the text data has to be pre-processed and then utilised for further processing. Pre-processing makes the text data more efficient and would infer valuable insights of the emotions hidden in it. The preprocessing steps also help to manage the text data for identifying the emotions conveyed in the text. This work proposes to deduct the emotions taken from the social media text data by applying the machine learning algorithm. Finally, the usefulness of the emotions is suggested for various stake holders, to find the attitude of individuals at that moment, the data is produced. .展开更多
This study concerns a Ka-band solid-state transmitter cloud radar, made in China, which can operate in three different work modes, with different pulse widths, and coherent and incoherent integration numbers, to meet ...This study concerns a Ka-band solid-state transmitter cloud radar, made in China, which can operate in three different work modes, with different pulse widths, and coherent and incoherent integration numbers, to meet the requirements for cloud remote sensing over the Tibetan Plateau. Specifically, the design of the three operational modes of the radar(i.e., boundary mode M1, cirrus mode M2, and precipitation mode M3) is introduced. Also, a cloud radar data merging algorithm for the three modes is proposed. Using one month's continuous measurements during summertime at Naqu on the Tibetan Plateau,we analyzed the consistency between the cloud radar measurements of the three modes. The number of occurrences of radar detections of hydrometeors and the percentage contributions of the different modes' data to the merged data were estimated.The performance of the merging algorithm was evaluated. The results indicated that the minimum detectable reflectivity for each mode was consistent with theoretical results. Merged data provided measurements with a minimum reflectivity of -35 dBZ at the height of 5 km, and obtained information above the height of 0.2 km. Measurements of radial velocity by the three operational modes agreed very well, and systematic errors in measurements of reflectivity were less than 2 dB. However,large discrepancies existed in the measurements of the linear depolarization ratio taken from the different operational modes.The percentage of radar detections of hydrometeors in mid- and high-level clouds increased by 60% through application of pulse compression techniques. In conclusion, the merged data are appropriate for cloud and precipitation studies over the Tibetan Plateau.展开更多
Aiming at three-passive-sensor location system, a generalized 3-dimension (3-D) assignment model is constructed based on property information, and a multi-target programming model is proposed based on direction-find...Aiming at three-passive-sensor location system, a generalized 3-dimension (3-D) assignment model is constructed based on property information, and a multi-target programming model is proposed based on direction-finding and property fusion information. The multi-target programming model is transformed into a single target programming problem to resolve, and its data association result is compared with the results which are solved by using one kind of information only. Simulation experiments show the effectiveness of the multi-target programming algorithm with higher data association accuracy and less calculation.展开更多
Clustering, in data mining, is a useful technique for discovering interesting data distributions and patterns in the underlying data, and has many application fields, such as statistical data analysis, pattern recogni...Clustering, in data mining, is a useful technique for discovering interesting data distributions and patterns in the underlying data, and has many application fields, such as statistical data analysis, pattern recognition, image processing, and etc. We combine sampling technique with DBSCAN algorithm to cluster large spatial databases, and two sampling based DBSCAN (SDBSCAN) algorithms are developed. One algorithm introduces sampling technique inside DBSCAN, and the other uses sampling procedure outside DBSCAN. Experimental results demonstrate that our algorithms are effective and efficient in clustering large scale spatial databases.展开更多
A specialized Hungarian algorithm was developed here for the maximum likelihood data association problem with two implementation versions due to presence of false alarms and missed detections. The maximum likelihood d...A specialized Hungarian algorithm was developed here for the maximum likelihood data association problem with two implementation versions due to presence of false alarms and missed detections. The maximum likelihood data association problem is formulated as a bipartite weighted matching problem. Its duality and the optimality conditions are given. The Hungarian algorithm with its computational steps, data structure and computational complexity is presented. The two implementation versions, Hungarian forest (HF) algorithm and Hungarian tree (HT) algorithm, and their combination with the naYve auction initialization are discussed. The computational results show that HT algorithm is slightly faster than HF algorithm and they are both superior to the classic Munkres algorithm.展开更多
Many business applications rely on their historical data to predict their business future. The marketing products process is one of the core processes for the business. Customer needs give a useful piece of informatio...Many business applications rely on their historical data to predict their business future. The marketing products process is one of the core processes for the business. Customer needs give a useful piece of information that help</span><span style="font-family:Verdana;"><span style="font-family:Verdana;">s</span></span><span style="font-family:Verdana;"> to market the appropriate products at the appropriate time. Moreover, services are considered recently as products. The development of education and health services </span><span style="font-family:Verdana;"><span style="font-family:Verdana;">is</span></span><span style="font-family:Verdana;"> depending on historical data. For the more, reducing online social media networks problems and crimes need a significant source of information. Data analysts need to use an efficient classification algorithm to predict the future of such businesses. However, dealing with a huge quantity of data requires great time to process. Data mining involves many useful techniques that are used to predict statistical data in a variety of business applications. The classification technique is one of the most widely used with a variety of algorithms. In this paper, various classification algorithms are revised in terms of accuracy in different areas of data mining applications. A comprehensive analysis is made after delegated reading of 20 papers in the literature. This paper aims to help data analysts to choose the most suitable classification algorithm for different business applications including business in general, online social media networks, agriculture, health, and education. Results show FFBPN is the most accurate algorithm in the business domain. The Random Forest algorithm is the most accurate in classifying online social networks (OSN) activities. Na<span style="white-space:nowrap;">ï</span>ve Bayes algorithm is the most accurate to classify agriculture datasets. OneR is the most accurate algorithm to classify instances within the health domain. The C4.5 Decision Tree algorithm is the most accurate to classify students’ records to predict degree completion time.展开更多
HT-7 superconducting tokamak in the Institute of Plasma Physics of the Chinese Academy of Sciences is an experimental device for fusion research in China. The main task of the data acquisition system of HT-7 is to acq...HT-7 superconducting tokamak in the Institute of Plasma Physics of the Chinese Academy of Sciences is an experimental device for fusion research in China. The main task of the data acquisition system of HT-7 is to acquire, store, analyze and index the data. The volume of the data is nearly up to hundreds of million bytes. Besides the hardware and software support, a great capacity of data storage, process and transfer is a more important problem. To deal with this problem, the key technology is data compression algorithm. In the paper, the data format in HT-7 is introduced first, then the data compression algorithm, LZO, being a kind of portable lossless data compression algorithm with ANSI C, is analyzed. This compression algorithm, which fits well with the data acquisition and distribution in the nuclear fusion experiment, offers a pretty fast compression and extremely fast decompression. At last the performance evaluation of LZO application in HT-7 is given.展开更多
In recent years, the rapid decline of Arctic sea ice area (SIA) and sea ice extent (SIE), especially for the multiyear (MY) ice, has led to significant effect on climate change. The accurate retrieval of MY ice ...In recent years, the rapid decline of Arctic sea ice area (SIA) and sea ice extent (SIE), especially for the multiyear (MY) ice, has led to significant effect on climate change. The accurate retrieval of MY ice concentration retrieval is very important and challenging to understand the ongoing changes. Three MY ice concentration retrieval algorithms were systematically evaluated. A similar total ice concentration was yielded by these algorithms, while the retrieved MY sea ice concentrations differs from each other. The MY SIA derived from NASA TEAM algorithm is relatively stable. Other two algorithms created seasonal fluctuations of MY SIA, particularly in autumn and winter. In this paper, we proposed an ice concentration retrieval algorithm, which developed the NASA TEAM algorithm by adding to use AMSR-E 6.9 GHz brightness temperature data and sea ice concentration using 89.0 GHz data. Comparison with the reference MY SIA from reference MY ice, indicates that the mean difference and root mean square (rms) difference of MY SIA derived from the algorithm of this study are 0.65×10^6 km^2 and 0.69×10^6 km^2 during January to March, -0.06×10^6 km^2 and 0.14×10^6 km^2during September to December respectively. Comparison with MY SIE obtained from weekly ice age data provided by University of Colorado show that, the mean difference and rms difference are 0.69×10^6 km^2 and 0.84×10^6 km^2, respectively. The developed algorithm proposed in this study has smaller difference compared with the reference MY ice and MY SIE from ice age data than the Wang's, Lomax' and NASA TEAM algorithms.展开更多
With the increasing variety of application software of meteorological satellite ground system, how to provide reasonable hardware resources and improve the efficiency of software is paid more and more attention. In th...With the increasing variety of application software of meteorological satellite ground system, how to provide reasonable hardware resources and improve the efficiency of software is paid more and more attention. In this paper, a set of software classification method based on software operating characteristics is proposed. The method uses software run-time resource consumption to describe the software running characteristics. Firstly, principal component analysis (PCA) is used to reduce the dimension of software running feature data and to interpret software characteristic information. Then the modified K-means algorithm was used to classify the meteorological data processing software. Finally, it combined with the results of principal component analysis to explain the significance of various types of integrated software operating characteristics. And it is used as the basis for optimizing the allocation of software hardware resources and improving the efficiency of software operation.展开更多
Improved traditional ant colony algorithms,a data routing model used to the data remote exchange on WAN was presented.In the model,random heuristic factors were introduced to realize multi-path search.The updating mod...Improved traditional ant colony algorithms,a data routing model used to the data remote exchange on WAN was presented.In the model,random heuristic factors were introduced to realize multi-path search.The updating model of pheromone could adjust the pheromone concentration on the optimal path according to path load dynamically to make the system keep load balance.The simulation results show that the improved model has a higher performance on convergence and load balance.展开更多
The genetic algorithm is useful for solving an inversion of complex nonlinear geophysical equations. The multi-point search of the genetic algorithm makes it easier to find a globally optimal solution and avoid fall...The genetic algorithm is useful for solving an inversion of complex nonlinear geophysical equations. The multi-point search of the genetic algorithm makes it easier to find a globally optimal solution and avoid falling into a local extremum. The search efficiency of the genetic algorithm is a key to producing successful solutions in a huge multi-parameter model space. The encoding mechanism of the genetic algorithm affects the searching processes in the evolution. Not all genetic operations perform perfectly in a search under either a binary or decimal encoding system. As such, a standard genetic algorithm (SGA) is sometimes unable to resolve an optimization problem such as a simple geophysical inversion. With the binary encoding system the operation of the crossover may produce more new individuals. The decimal encoding system, on the other hand, makes the mutation generate more new genes. This paper discusses approaches of exploiting the search potentials of genetic operations with different encoding systems and presents a hybrid-encoding mechanism for the genetic algorithm. This is referred to as the hybrid-encoding genetic algorithm (HEGA). The method is based on the routine in which the mutation operation is executed in decimal code and other operations in binary code. HEGA guarantees the birth of better genes by mutation processing with a high probability, so that it is beneficial for resolving the inversions of complicated problems. Synthetic and real-world examples demonstrate the advantages of using HEGA in the inversion of potential-field data.展开更多
"Data Structure and Algorithm",which is an important major subject in computer science,has a lot of problems in teaching activity.This paper introduces and analyzes the situation and problems in this course ..."Data Structure and Algorithm",which is an important major subject in computer science,has a lot of problems in teaching activity.This paper introduces and analyzes the situation and problems in this course study.A "programming factory" method is then brought out which is indeed a practice-oriented platform of the teachingstudy process.Good results are obtained by this creative method.展开更多
3D image reconstruction for weather radar data can not only help the weatherman to improve the forecast efficiency and accuracy, but also help people to understand the weather conditions easily and quickly. Marching C...3D image reconstruction for weather radar data can not only help the weatherman to improve the forecast efficiency and accuracy, but also help people to understand the weather conditions easily and quickly. Marching Cubes (MC) algorithm in the surface rendering has more excellent applicability in 3D reconstruction for the slice images;it may shorten the time to find and calculate the isosurface from raw volume data, reflect the shape structure more accurately. In this paper, we discuss a method to reconstruct the 3D weather cloud image by using the proposed Cube Weighting Interpolation (CWI) and MC algorithm. Firstly, we detail the steps of CWI, apply it to project the raw radar data into the cubes and obtain the equally spaced cloud slice images, then employ MC algorithm to draw the isosurface. Some experiments show that our method has a good effect and simple operation, which may provide an intuitive and effective reference for realizing the 3D surface reconstruction and meteorological image stereo visualization.展开更多
Vector quantization (VQ) is an important data compression method. The key of the encoding of VQ is to find the closest vector among N vectors for a feature vector. Many classical linear search algorithms take O(N)...Vector quantization (VQ) is an important data compression method. The key of the encoding of VQ is to find the closest vector among N vectors for a feature vector. Many classical linear search algorithms take O(N) steps of distance computing between two vectors. The quantum VQ iteration and corresponding quantum VQ encoding algorithm that takes O(√N) steps are presented in this paper. The unitary operation of distance computing can be performed on a number of vectors simultaneously because the quantum state exists in a superposition of states. The quantum VQ iteration comprises three oracles, by contrast many quantum algorithms have only one oracle, such as Shor's factorization algorithm and Grover's algorithm. Entanglement state is generated and used, by contrast the state in Grover's algorithm is not an entanglement state. The quantum VQ iteration is a rotation over subspace, by contrast the Grover iteration is a rotation over global space. The quantum VQ iteration extends the Grover iteration to the more complex search that requires more oracles. The method of the quantum VQ iteration is universal.展开更多
The design and operation features of the automatic data acquisition system for the low latitude ionospheric tomography along the 120°E meridian are presented. The system automatically collects the differential Do...The design and operation features of the automatic data acquisition system for the low latitude ionospheric tomography along the 120°E meridian are presented. The system automatically collects the differential Doppler phase data,and GPS satellite beacon signal is simultaneously collected to achieve time synchronization of all receivers in whole station chain. An improved reconstruction algorithm of computerized ionospheric tomography is also proposed, in which calculating of the integral phase constant and choosing of the initial guess are integrated in the procedure of reconstruction and evaluated by the reconstructed image. Both numerical simulation examples and reconstructed results from observed data show that the new algorithm works reasonably and effectively with ionospheric CT problems.展开更多
With the development of Computerized Business Application, the amount of data is increasing exponentially. Cloud computing provides high performance computing resources and mass storage resources for massive data proc...With the development of Computerized Business Application, the amount of data is increasing exponentially. Cloud computing provides high performance computing resources and mass storage resources for massive data processing. In distributed cloud computing systems, data intensive computing can lead to data scheduling between data centers. Reasonable data placement can reduce data scheduling between the data centers effectively, and improve the data acquisition efficiency of users. In this paper, the mathematical model of data scheduling between data centers is built. By means of the global optimization ability of the genetic algorithm, generational evolution produces better approximate solution, and gets the best approximation of the data placement at last. The experimental results show that genetic algorithm can effectively work out the approximate optimal data placement, and minimize data scheduling between data centers.展开更多
A robust phase-only Direct Data Domain Least Squares (D3LS) algorithm based on gen- eralized Rayleigh quotient optimization using hybrid Genetic Algorithm (GA) is presented in this letter. The optimization efficiency ...A robust phase-only Direct Data Domain Least Squares (D3LS) algorithm based on gen- eralized Rayleigh quotient optimization using hybrid Genetic Algorithm (GA) is presented in this letter. The optimization efficiency and computational speed are improved via the hybrid GA com- posed of standard GA and Nelder-Mead simplex algorithms. First, the objective function, with a form of generalized Rayleigh quotient, is derived via the standard D3LS algorithm. It is then taken as a fitness function and the unknown phases of all adaptive weights are taken as decision variables. Then, the nonlinear optimization is performed via the hybrid GA to obtain the optimized solution of phase-only adaptive weights. As a phase-only adaptive algorithm, the proposed algorithm is sim- pler than conventional algorithms when it comes to hardware implementation. Moreover, it proc- esses only a single snapshot data as opposed to forming sample covariance matrix and operating matrix inversion. Simulation results show that the proposed algorithm has a good signal recovery and interferences nulling performance, which are superior to that of the phase-only D3LS algorithm based on standard GA.展开更多
This paper addresses the problem of selecting a route for every pair of communicating nodes in a virtual circuit data network in order to minimize the average delay encountered by messages. The problem was previously ...This paper addresses the problem of selecting a route for every pair of communicating nodes in a virtual circuit data network in order to minimize the average delay encountered by messages. The problem was previously modeled as a network of M/M/1 queues. Agenetic algorithm to solve this problem is presented. Extensive computational results across a variety of networks are reported. These results indicate that the presented solution procedure outperforms the other methods in the literature and is effective for a wide range of traffic loads.展开更多
文摘Compositional data, such as relative information, is a crucial aspect of machine learning and other related fields. It is typically recorded as closed data or sums to a constant, like 100%. The statistical linear model is the most used technique for identifying hidden relationships between underlying random variables of interest. However, data quality is a significant challenge in machine learning, especially when missing data is present. The linear regression model is a commonly used statistical modeling technique used in various applications to find relationships between variables of interest. When estimating linear regression parameters which are useful for things like future prediction and partial effects analysis of independent variables, maximum likelihood estimation (MLE) is the method of choice. However, many datasets contain missing observations, which can lead to costly and time-consuming data recovery. To address this issue, the expectation-maximization (EM) algorithm has been suggested as a solution for situations including missing data. The EM algorithm repeatedly finds the best estimates of parameters in statistical models that depend on variables or data that have not been observed. This is called maximum likelihood or maximum a posteriori (MAP). Using the present estimate as input, the expectation (E) step constructs a log-likelihood function. Finding the parameters that maximize the anticipated log-likelihood, as determined in the E step, is the job of the maximization (M) phase. This study looked at how well the EM algorithm worked on a made-up compositional dataset with missing observations. It used both the robust least square version and ordinary least square regression techniques. The efficacy of the EM algorithm was compared with two alternative imputation techniques, k-Nearest Neighbor (k-NN) and mean imputation (), in terms of Aitchison distances and covariance.
文摘Emotion represents the feeling of an individual in a given situation. There are various ways to express the emotions of an individual. It can be categorized into verbal expressions, written expressions, facial expressions and gestures. Among these various ways of expressing the emotion, the written method is a challenging task to extract the emotions, as the data is in the form of textual dat. Finding the different kinds of emotions is also a tedious task as it requires a lot of pre preparations of the textual data taken for the research. This research work is carried out to analyse and extract the emotions hidden in text data. The text data taken for the analysis is from the social media dataset. Using the raw text data directly from the social media will not serve the purpose. Therefore, the text data has to be pre-processed and then utilised for further processing. Pre-processing makes the text data more efficient and would infer valuable insights of the emotions hidden in it. The preprocessing steps also help to manage the text data for identifying the emotions conveyed in the text. This work proposes to deduct the emotions taken from the social media text data by applying the machine learning algorithm. Finally, the usefulness of the emotions is suggested for various stake holders, to find the attitude of individuals at that moment, the data is produced. .
基金funded by the National Sciences Foundation of China(Grant No.91337103)the China Meteorological Administration Special Public Welfare Research Fund(Grant No.GYHY201406001)
文摘This study concerns a Ka-band solid-state transmitter cloud radar, made in China, which can operate in three different work modes, with different pulse widths, and coherent and incoherent integration numbers, to meet the requirements for cloud remote sensing over the Tibetan Plateau. Specifically, the design of the three operational modes of the radar(i.e., boundary mode M1, cirrus mode M2, and precipitation mode M3) is introduced. Also, a cloud radar data merging algorithm for the three modes is proposed. Using one month's continuous measurements during summertime at Naqu on the Tibetan Plateau,we analyzed the consistency between the cloud radar measurements of the three modes. The number of occurrences of radar detections of hydrometeors and the percentage contributions of the different modes' data to the merged data were estimated.The performance of the merging algorithm was evaluated. The results indicated that the minimum detectable reflectivity for each mode was consistent with theoretical results. Merged data provided measurements with a minimum reflectivity of -35 dBZ at the height of 5 km, and obtained information above the height of 0.2 km. Measurements of radial velocity by the three operational modes agreed very well, and systematic errors in measurements of reflectivity were less than 2 dB. However,large discrepancies existed in the measurements of the linear depolarization ratio taken from the different operational modes.The percentage of radar detections of hydrometeors in mid- and high-level clouds increased by 60% through application of pulse compression techniques. In conclusion, the merged data are appropriate for cloud and precipitation studies over the Tibetan Plateau.
基金This project was supported by the National Natural Science Foundation of China (60172033) the Excellent Ph.D.PaperAuthor Foundation of China (200036 ,200237) .
文摘Aiming at three-passive-sensor location system, a generalized 3-dimension (3-D) assignment model is constructed based on property information, and a multi-target programming model is proposed based on direction-finding and property fusion information. The multi-target programming model is transformed into a single target programming problem to resolve, and its data association result is compared with the results which are solved by using one kind of information only. Simulation experiments show the effectiveness of the multi-target programming algorithm with higher data association accuracy and less calculation.
基金Supported by the Open Researches Fund Program of L IESMARS(WKL(0 0 ) 0 30 2 )
文摘Clustering, in data mining, is a useful technique for discovering interesting data distributions and patterns in the underlying data, and has many application fields, such as statistical data analysis, pattern recognition, image processing, and etc. We combine sampling technique with DBSCAN algorithm to cluster large spatial databases, and two sampling based DBSCAN (SDBSCAN) algorithms are developed. One algorithm introduces sampling technique inside DBSCAN, and the other uses sampling procedure outside DBSCAN. Experimental results demonstrate that our algorithms are effective and efficient in clustering large scale spatial databases.
基金This project was supported by the National Natural Science Foundation of China (60272024).
文摘A specialized Hungarian algorithm was developed here for the maximum likelihood data association problem with two implementation versions due to presence of false alarms and missed detections. The maximum likelihood data association problem is formulated as a bipartite weighted matching problem. Its duality and the optimality conditions are given. The Hungarian algorithm with its computational steps, data structure and computational complexity is presented. The two implementation versions, Hungarian forest (HF) algorithm and Hungarian tree (HT) algorithm, and their combination with the naYve auction initialization are discussed. The computational results show that HT algorithm is slightly faster than HF algorithm and they are both superior to the classic Munkres algorithm.
文摘Many business applications rely on their historical data to predict their business future. The marketing products process is one of the core processes for the business. Customer needs give a useful piece of information that help</span><span style="font-family:Verdana;"><span style="font-family:Verdana;">s</span></span><span style="font-family:Verdana;"> to market the appropriate products at the appropriate time. Moreover, services are considered recently as products. The development of education and health services </span><span style="font-family:Verdana;"><span style="font-family:Verdana;">is</span></span><span style="font-family:Verdana;"> depending on historical data. For the more, reducing online social media networks problems and crimes need a significant source of information. Data analysts need to use an efficient classification algorithm to predict the future of such businesses. However, dealing with a huge quantity of data requires great time to process. Data mining involves many useful techniques that are used to predict statistical data in a variety of business applications. The classification technique is one of the most widely used with a variety of algorithms. In this paper, various classification algorithms are revised in terms of accuracy in different areas of data mining applications. A comprehensive analysis is made after delegated reading of 20 papers in the literature. This paper aims to help data analysts to choose the most suitable classification algorithm for different business applications including business in general, online social media networks, agriculture, health, and education. Results show FFBPN is the most accurate algorithm in the business domain. The Random Forest algorithm is the most accurate in classifying online social networks (OSN) activities. Na<span style="white-space:nowrap;">ï</span>ve Bayes algorithm is the most accurate to classify agriculture datasets. OneR is the most accurate algorithm to classify instances within the health domain. The C4.5 Decision Tree algorithm is the most accurate to classify students’ records to predict degree completion time.
基金The project supported by the Meg-Science Enineering Project of Chinese Acdemy of Sciences
文摘HT-7 superconducting tokamak in the Institute of Plasma Physics of the Chinese Academy of Sciences is an experimental device for fusion research in China. The main task of the data acquisition system of HT-7 is to acquire, store, analyze and index the data. The volume of the data is nearly up to hundreds of million bytes. Besides the hardware and software support, a great capacity of data storage, process and transfer is a more important problem. To deal with this problem, the key technology is data compression algorithm. In the paper, the data format in HT-7 is introduced first, then the data compression algorithm, LZO, being a kind of portable lossless data compression algorithm with ANSI C, is analyzed. This compression algorithm, which fits well with the data acquisition and distribution in the nuclear fusion experiment, offers a pretty fast compression and extremely fast decompression. At last the performance evaluation of LZO application in HT-7 is given.
基金The National Natural Science Foundation of China under contract Nos 41330960 and 41276193 and 41206184
文摘In recent years, the rapid decline of Arctic sea ice area (SIA) and sea ice extent (SIE), especially for the multiyear (MY) ice, has led to significant effect on climate change. The accurate retrieval of MY ice concentration retrieval is very important and challenging to understand the ongoing changes. Three MY ice concentration retrieval algorithms were systematically evaluated. A similar total ice concentration was yielded by these algorithms, while the retrieved MY sea ice concentrations differs from each other. The MY SIA derived from NASA TEAM algorithm is relatively stable. Other two algorithms created seasonal fluctuations of MY SIA, particularly in autumn and winter. In this paper, we proposed an ice concentration retrieval algorithm, which developed the NASA TEAM algorithm by adding to use AMSR-E 6.9 GHz brightness temperature data and sea ice concentration using 89.0 GHz data. Comparison with the reference MY SIA from reference MY ice, indicates that the mean difference and root mean square (rms) difference of MY SIA derived from the algorithm of this study are 0.65×10^6 km^2 and 0.69×10^6 km^2 during January to March, -0.06×10^6 km^2 and 0.14×10^6 km^2during September to December respectively. Comparison with MY SIE obtained from weekly ice age data provided by University of Colorado show that, the mean difference and rms difference are 0.69×10^6 km^2 and 0.84×10^6 km^2, respectively. The developed algorithm proposed in this study has smaller difference compared with the reference MY ice and MY SIE from ice age data than the Wang's, Lomax' and NASA TEAM algorithms.
文摘With the increasing variety of application software of meteorological satellite ground system, how to provide reasonable hardware resources and improve the efficiency of software is paid more and more attention. In this paper, a set of software classification method based on software operating characteristics is proposed. The method uses software run-time resource consumption to describe the software running characteristics. Firstly, principal component analysis (PCA) is used to reduce the dimension of software running feature data and to interpret software characteristic information. Then the modified K-means algorithm was used to classify the meteorological data processing software. Finally, it combined with the results of principal component analysis to explain the significance of various types of integrated software operating characteristics. And it is used as the basis for optimizing the allocation of software hardware resources and improving the efficiency of software operation.
基金Sponsored by the National High Technology Research and Development Program of China(2006AA701306)the National Innovation Foundation of Enterprises(05C26212200378)
文摘Improved traditional ant colony algorithms,a data routing model used to the data remote exchange on WAN was presented.In the model,random heuristic factors were introduced to realize multi-path search.The updating model of pheromone could adjust the pheromone concentration on the optimal path according to path load dynamically to make the system keep load balance.The simulation results show that the improved model has a higher performance on convergence and load balance.
文摘The genetic algorithm is useful for solving an inversion of complex nonlinear geophysical equations. The multi-point search of the genetic algorithm makes it easier to find a globally optimal solution and avoid falling into a local extremum. The search efficiency of the genetic algorithm is a key to producing successful solutions in a huge multi-parameter model space. The encoding mechanism of the genetic algorithm affects the searching processes in the evolution. Not all genetic operations perform perfectly in a search under either a binary or decimal encoding system. As such, a standard genetic algorithm (SGA) is sometimes unable to resolve an optimization problem such as a simple geophysical inversion. With the binary encoding system the operation of the crossover may produce more new individuals. The decimal encoding system, on the other hand, makes the mutation generate more new genes. This paper discusses approaches of exploiting the search potentials of genetic operations with different encoding systems and presents a hybrid-encoding mechanism for the genetic algorithm. This is referred to as the hybrid-encoding genetic algorithm (HEGA). The method is based on the routine in which the mutation operation is executed in decimal code and other operations in binary code. HEGA guarantees the birth of better genes by mutation processing with a high probability, so that it is beneficial for resolving the inversions of complicated problems. Synthetic and real-world examples demonstrate the advantages of using HEGA in the inversion of potential-field data.
基金supported by NSF B55101680,NTIF B2090571,B2110140,SCUT x2rjD2116860,Y1080170,Y1090160,Y1100030,Y1100050,Y1110020 and S1010561121,G101056137
文摘"Data Structure and Algorithm",which is an important major subject in computer science,has a lot of problems in teaching activity.This paper introduces and analyzes the situation and problems in this course study.A "programming factory" method is then brought out which is indeed a practice-oriented platform of the teachingstudy process.Good results are obtained by this creative method.
文摘3D image reconstruction for weather radar data can not only help the weatherman to improve the forecast efficiency and accuracy, but also help people to understand the weather conditions easily and quickly. Marching Cubes (MC) algorithm in the surface rendering has more excellent applicability in 3D reconstruction for the slice images;it may shorten the time to find and calculate the isosurface from raw volume data, reflect the shape structure more accurately. In this paper, we discuss a method to reconstruct the 3D weather cloud image by using the proposed Cube Weighting Interpolation (CWI) and MC algorithm. Firstly, we detail the steps of CWI, apply it to project the raw radar data into the cubes and obtain the equally spaced cloud slice images, then employ MC algorithm to draw the isosurface. Some experiments show that our method has a good effect and simple operation, which may provide an intuitive and effective reference for realizing the 3D surface reconstruction and meteorological image stereo visualization.
文摘Vector quantization (VQ) is an important data compression method. The key of the encoding of VQ is to find the closest vector among N vectors for a feature vector. Many classical linear search algorithms take O(N) steps of distance computing between two vectors. The quantum VQ iteration and corresponding quantum VQ encoding algorithm that takes O(√N) steps are presented in this paper. The unitary operation of distance computing can be performed on a number of vectors simultaneously because the quantum state exists in a superposition of states. The quantum VQ iteration comprises three oracles, by contrast many quantum algorithms have only one oracle, such as Shor's factorization algorithm and Grover's algorithm. Entanglement state is generated and used, by contrast the state in Grover's algorithm is not an entanglement state. The quantum VQ iteration is a rotation over subspace, by contrast the Grover iteration is a rotation over global space. The quantum VQ iteration extends the Grover iteration to the more complex search that requires more oracles. The method of the quantum VQ iteration is universal.
文摘The design and operation features of the automatic data acquisition system for the low latitude ionospheric tomography along the 120°E meridian are presented. The system automatically collects the differential Doppler phase data,and GPS satellite beacon signal is simultaneously collected to achieve time synchronization of all receivers in whole station chain. An improved reconstruction algorithm of computerized ionospheric tomography is also proposed, in which calculating of the integral phase constant and choosing of the initial guess are integrated in the procedure of reconstruction and evaluated by the reconstructed image. Both numerical simulation examples and reconstructed results from observed data show that the new algorithm works reasonably and effectively with ionospheric CT problems.
文摘With the development of Computerized Business Application, the amount of data is increasing exponentially. Cloud computing provides high performance computing resources and mass storage resources for massive data processing. In distributed cloud computing systems, data intensive computing can lead to data scheduling between data centers. Reasonable data placement can reduce data scheduling between the data centers effectively, and improve the data acquisition efficiency of users. In this paper, the mathematical model of data scheduling between data centers is built. By means of the global optimization ability of the genetic algorithm, generational evolution produces better approximate solution, and gets the best approximation of the data placement at last. The experimental results show that genetic algorithm can effectively work out the approximate optimal data placement, and minimize data scheduling between data centers.
基金Supported by the Natural Science Foundation of Jiangsu Province (No.BK2004016).
文摘A robust phase-only Direct Data Domain Least Squares (D3LS) algorithm based on gen- eralized Rayleigh quotient optimization using hybrid Genetic Algorithm (GA) is presented in this letter. The optimization efficiency and computational speed are improved via the hybrid GA com- posed of standard GA and Nelder-Mead simplex algorithms. First, the objective function, with a form of generalized Rayleigh quotient, is derived via the standard D3LS algorithm. It is then taken as a fitness function and the unknown phases of all adaptive weights are taken as decision variables. Then, the nonlinear optimization is performed via the hybrid GA to obtain the optimized solution of phase-only adaptive weights. As a phase-only adaptive algorithm, the proposed algorithm is sim- pler than conventional algorithms when it comes to hardware implementation. Moreover, it proc- esses only a single snapshot data as opposed to forming sample covariance matrix and operating matrix inversion. Simulation results show that the proposed algorithm has a good signal recovery and interferences nulling performance, which are superior to that of the phase-only D3LS algorithm based on standard GA.
基金Supported by National Natural Science Foundation of China (61304079, 61125306, 61034002), the Open Research Project from SKLMCCS (20120106), the Fundamental Research Funds for the Central Universities (FRF-TP-13-018A), and the China Postdoctoral Science. Foundation (201_3M_ 5305_27)_ _ _
文摘为有致动器浸透和未知动力学的分离时间的系统的一个班的一个新奇最佳的追踪控制方法在这份报纸被建议。计划基于反复的适应动态编程(自动数据处理) 算法。以便实现控制计划,一个 data-based 标识符首先为未知系统动力学被构造。由介绍 M 网络,稳定的控制的明确的公式被完成。以便消除致动器浸透的效果, nonquadratic 表演功能被介绍,然后一个反复的自动数据处理算法被建立与集中分析完成最佳的追踪控制解决方案。为实现最佳的控制方法,神经网络被用来建立 data-based 标识符,计算性能索引功能,近似最佳的控制政策并且分别地解决稳定的控制。模拟例子被提供验证介绍最佳的追踪的控制计划的有效性。
文摘This paper addresses the problem of selecting a route for every pair of communicating nodes in a virtual circuit data network in order to minimize the average delay encountered by messages. The problem was previously modeled as a network of M/M/1 queues. Agenetic algorithm to solve this problem is presented. Extensive computational results across a variety of networks are reported. These results indicate that the presented solution procedure outperforms the other methods in the literature and is effective for a wide range of traffic loads.