The development of technologies such as big data and blockchain has brought convenience to life,but at the same time,privacy and security issues are becoming more and more prominent.The K-anonymity algorithm is an eff...The development of technologies such as big data and blockchain has brought convenience to life,but at the same time,privacy and security issues are becoming more and more prominent.The K-anonymity algorithm is an effective and low computational complexity privacy-preserving algorithm that can safeguard users’privacy by anonymizing big data.However,the algorithm currently suffers from the problem of focusing only on improving user privacy while ignoring data availability.In addition,ignoring the impact of quasi-identified attributes on sensitive attributes causes the usability of the processed data on statistical analysis to be reduced.Based on this,we propose a new K-anonymity algorithm to solve the privacy security problem in the context of big data,while guaranteeing improved data usability.Specifically,we construct a new information loss function based on the information quantity theory.Considering that different quasi-identification attributes have different impacts on sensitive attributes,we set weights for each quasi-identification attribute when designing the information loss function.In addition,to reduce information loss,we improve K-anonymity in two ways.First,we make the loss of information smaller than in the original table while guaranteeing privacy based on common artificial intelligence algorithms,i.e.,greedy algorithm and 2-means clustering algorithm.In addition,we improve the 2-means clustering algorithm by designing a mean-center method to select the initial center of mass.Meanwhile,we design the K-anonymity algorithm of this scheme based on the constructed information loss function,the improved 2-means clustering algorithm,and the greedy algorithm,which reduces the information loss.Finally,we experimentally demonstrate the effectiveness of the algorithm in improving the effect of 2-means clustering and reducing information loss.展开更多
Current methodologies for cleaning wind power anomaly data exhibit limited capabilities in identifying abnormal data within extensive datasets and struggle to accommodate the considerable variability and intricacy of ...Current methodologies for cleaning wind power anomaly data exhibit limited capabilities in identifying abnormal data within extensive datasets and struggle to accommodate the considerable variability and intricacy of wind farm data.Consequently,a method for cleaning wind power anomaly data by combining image processing with community detection algorithms(CWPAD-IPCDA)is proposed.To precisely identify and initially clean anomalous data,wind power curve(WPC)images are converted into graph structures,which employ the Louvain community recognition algorithm and graph-theoretic methods for community detection and segmentation.Furthermore,the mathematical morphology operation(MMO)determines the main part of the initially cleaned wind power curve images and maps them back to the normal wind power points to complete the final cleaning.The CWPAD-IPCDA method was applied to clean datasets from 25 wind turbines(WTs)in two wind farms in northwest China to validate its feasibility.A comparison was conducted using density-based spatial clustering of applications with noise(DBSCAN)algorithm,an improved isolation forest algorithm,and an image-based(IB)algorithm.The experimental results demonstrate that the CWPAD-IPCDA method surpasses the other three algorithms,achieving an approximately 7.23%higher average data cleaning rate.The mean value of the sum of the squared errors(SSE)of the dataset after cleaning is approximately 6.887 lower than that of the other algorithms.Moreover,the mean of overall accuracy,as measured by the F1-score,exceeds that of the other methods by approximately 10.49%;this indicates that the CWPAD-IPCDA method is more conducive to improving the accuracy and reliability of wind power curve modeling and wind farm power forecasting.展开更多
The Chang'e-3 (CE-3) mission is China's first exploration mission on the surface of the Moon that uses a lander and a rover. Eight instruments that form the scientific payloads have the following objectives: (1...The Chang'e-3 (CE-3) mission is China's first exploration mission on the surface of the Moon that uses a lander and a rover. Eight instruments that form the scientific payloads have the following objectives: (1) investigate the morphological features and geological structures at the landing site; (2) integrated in-situ analysis of minerals and chemical compositions; (3) integrated exploration of the structure of the lunar interior; (4) exploration of the lunar-terrestrial space environment, lunar sur- face environment and acquire Moon-based ultraviolet astronomical observations. The Ground Research and Application System (GRAS) is in charge of data acquisition and pre-processing, management of the payload in orbit, and managing the data products and their applications. The Data Pre-processing Subsystem (DPS) is a part of GRAS. The task of DPS is the pre-processing of raw data from the eight instruments that are part of CE-3, including channel processing, unpacking, package sorting, calibration and correction, identification of geographical location, calculation of probe azimuth angle, probe zenith angle, solar azimuth angle, and solar zenith angle and so on, and conducting quality checks. These processes produce Level 0, Level 1 and Level 2 data. The computing platform of this subsystem is comprised of a high-performance computing cluster, including a real-time subsystem used for processing Level 0 data and a post-time subsystem for generating Level 1 and Level 2 data. This paper de- scribes the CE-3 data pre-processing method, the data pre-processing subsystem, data classification, data validity and data products that are used for scientific studies.展开更多
Compositional data, such as relative information, is a crucial aspect of machine learning and other related fields. It is typically recorded as closed data or sums to a constant, like 100%. The statistical linear mode...Compositional data, such as relative information, is a crucial aspect of machine learning and other related fields. It is typically recorded as closed data or sums to a constant, like 100%. The statistical linear model is the most used technique for identifying hidden relationships between underlying random variables of interest. However, data quality is a significant challenge in machine learning, especially when missing data is present. The linear regression model is a commonly used statistical modeling technique used in various applications to find relationships between variables of interest. When estimating linear regression parameters which are useful for things like future prediction and partial effects analysis of independent variables, maximum likelihood estimation (MLE) is the method of choice. However, many datasets contain missing observations, which can lead to costly and time-consuming data recovery. To address this issue, the expectation-maximization (EM) algorithm has been suggested as a solution for situations including missing data. The EM algorithm repeatedly finds the best estimates of parameters in statistical models that depend on variables or data that have not been observed. This is called maximum likelihood or maximum a posteriori (MAP). Using the present estimate as input, the expectation (E) step constructs a log-likelihood function. Finding the parameters that maximize the anticipated log-likelihood, as determined in the E step, is the job of the maximization (M) phase. This study looked at how well the EM algorithm worked on a made-up compositional dataset with missing observations. It used both the robust least square version and ordinary least square regression techniques. The efficacy of the EM algorithm was compared with two alternative imputation techniques, k-Nearest Neighbor (k-NN) and mean imputation (), in terms of Aitchison distances and covariance.展开更多
There are a number of dirty data in observation data set derived from integrated ocean observing network system. Thus, the data must be carefully and reasonably processed before they are used for forecasting or analys...There are a number of dirty data in observation data set derived from integrated ocean observing network system. Thus, the data must be carefully and reasonably processed before they are used for forecasting or analysis. This paper proposes a data pre-processing model based on intelligent algorithms. Firstly, we introduce the integrated network platform of ocean observation. Next, the preprocessing model of data is presemed, and an imelligent cleaning model of data is proposed. Based on fuzzy clustering, the Kohonen clustering network is improved to fulfill the parallel calculation of fuzzy c-means clustering. The proposed dynamic algorithm can automatically f'md the new clustering center with the updated sample data. The rapid and dynamic performance of the model makes it suitable for real time calculation, and the efficiency and accuracy of the model is proved by test results through observation data analysis.展开更多
The solution of linear equation group can be applied to the oil exploration, the structure vibration analysis, the computational fluid dynamics, and other fields. When we make the in-depth analysis of some large or ve...The solution of linear equation group can be applied to the oil exploration, the structure vibration analysis, the computational fluid dynamics, and other fields. When we make the in-depth analysis of some large or very large complicated structures, we must use the parallel algorithm with the aid of high-performance computers to solve complex problems. This paper introduces the implementation process having the parallel with sparse linear equations from the perspective of sparse linear equation group.展开更多
The satellite laser ranging (SLR) data quality from the COMPASS was analyzed, and the difference between curve recognition in computer vision and pre-process of SLR data finally proposed a new algorithm for SLR was ...The satellite laser ranging (SLR) data quality from the COMPASS was analyzed, and the difference between curve recognition in computer vision and pre-process of SLR data finally proposed a new algorithm for SLR was discussed data based on curve recognition from points cloud is proposed. The results obtained by the new algorithm are 85 % (or even higher) consistent with that of the screen displaying method, furthermore, the new method can process SLR data automatically, which makes it possible to be used in the development of the COMPASS navigation system.展开更多
To improve the performance of the traditional map matching algorithms in freeway traffic state monitoring systems using the low logging frequency GPS (global positioning system) probe data, a map matching algorithm ...To improve the performance of the traditional map matching algorithms in freeway traffic state monitoring systems using the low logging frequency GPS (global positioning system) probe data, a map matching algorithm based on the Oracle spatial data model is proposed. The algorithm uses the Oracle road network data model to analyze the spatial relationships between massive GPS positioning points and freeway networks, builds an N-shortest path algorithm to find reasonable candidate routes between GPS positioning points efficiently, and uses the fuzzy logic inference system to determine the final matched traveling route. According to the implementation with field data from Los Angeles, the computation speed of the algorithm is about 135 GPS positioning points per second and the accuracy is 98.9%. The results demonstrate the effectiveness and accuracy of the proposed algorithm for mapping massive GPS positioning data onto freeway networks with complex geometric characteristics.展开更多
Clustering, in data mining, is a useful technique for discovering interesting data distributions and patterns in the underlying data, and has many application fields, such as statistical data analysis, pattern recogni...Clustering, in data mining, is a useful technique for discovering interesting data distributions and patterns in the underlying data, and has many application fields, such as statistical data analysis, pattern recognition, image processing, and etc. We combine sampling technique with DBSCAN algorithm to cluster large spatial databases, and two sampling based DBSCAN (SDBSCAN) algorithms are developed. One algorithm introduces sampling technique inside DBSCAN, and the other uses sampling procedure outside DBSCAN. Experimental results demonstrate that our algorithms are effective and efficient in clustering large scale spatial databases.展开更多
A specialized Hungarian algorithm was developed here for the maximum likelihood data association problem with two implementation versions due to presence of false alarms and missed detections. The maximum likelihood d...A specialized Hungarian algorithm was developed here for the maximum likelihood data association problem with two implementation versions due to presence of false alarms and missed detections. The maximum likelihood data association problem is formulated as a bipartite weighted matching problem. Its duality and the optimality conditions are given. The Hungarian algorithm with its computational steps, data structure and computational complexity is presented. The two implementation versions, Hungarian forest (HF) algorithm and Hungarian tree (HT) algorithm, and their combination with the naYve auction initialization are discussed. The computational results show that HT algorithm is slightly faster than HF algorithm and they are both superior to the classic Munkres algorithm.展开更多
In recent years, the rapid decline of Arctic sea ice area (SIA) and sea ice extent (SIE), especially for the multiyear (MY) ice, has led to significant effect on climate change. The accurate retrieval of MY ice ...In recent years, the rapid decline of Arctic sea ice area (SIA) and sea ice extent (SIE), especially for the multiyear (MY) ice, has led to significant effect on climate change. The accurate retrieval of MY ice concentration retrieval is very important and challenging to understand the ongoing changes. Three MY ice concentration retrieval algorithms were systematically evaluated. A similar total ice concentration was yielded by these algorithms, while the retrieved MY sea ice concentrations differs from each other. The MY SIA derived from NASA TEAM algorithm is relatively stable. Other two algorithms created seasonal fluctuations of MY SIA, particularly in autumn and winter. In this paper, we proposed an ice concentration retrieval algorithm, which developed the NASA TEAM algorithm by adding to use AMSR-E 6.9 GHz brightness temperature data and sea ice concentration using 89.0 GHz data. Comparison with the reference MY SIA from reference MY ice, indicates that the mean difference and root mean square (rms) difference of MY SIA derived from the algorithm of this study are 0.65×10^6 km^2 and 0.69×10^6 km^2 during January to March, -0.06×10^6 km^2 and 0.14×10^6 km^2during September to December respectively. Comparison with MY SIE obtained from weekly ice age data provided by University of Colorado show that, the mean difference and rms difference are 0.69×10^6 km^2 and 0.84×10^6 km^2, respectively. The developed algorithm proposed in this study has smaller difference compared with the reference MY ice and MY SIE from ice age data than the Wang's, Lomax' and NASA TEAM algorithms.展开更多
Aiming at three-passive-sensor location system, a generalized 3-dimension (3-D) assignment model is constructed based on property information, and a multi-target programming model is proposed based on direction-find...Aiming at three-passive-sensor location system, a generalized 3-dimension (3-D) assignment model is constructed based on property information, and a multi-target programming model is proposed based on direction-finding and property fusion information. The multi-target programming model is transformed into a single target programming problem to resolve, and its data association result is compared with the results which are solved by using one kind of information only. Simulation experiments show the effectiveness of the multi-target programming algorithm with higher data association accuracy and less calculation.展开更多
With the continuous development of full tensor gradiometer (FTG) measurement techniques, three-dimensional (3D) inversion of FTG data is becoming increasingly used in oil and gas exploration. In the fast processin...With the continuous development of full tensor gradiometer (FTG) measurement techniques, three-dimensional (3D) inversion of FTG data is becoming increasingly used in oil and gas exploration. In the fast processing and interpretation of large-scale high-precision data, the use of the graphics processing unit process unit (GPU) and preconditioning methods are very important in the data inversion. In this paper, an improved preconditioned conjugate gradient algorithm is proposed by combining the symmetric successive over-relaxation (SSOR) technique and the incomplete Choleksy decomposition conjugate gradient algorithm (ICCG). Since preparing the preconditioner requires extra time, a parallel implement based on GPU is proposed. The improved method is then applied in the inversion of noise- contaminated synthetic data to prove its adaptability in the inversion of 3D FTG data. Results show that the parallel SSOR-ICCG algorithm based on NVIDIA Tesla C2050 GPU achieves a speedup of approximately 25 times that of a serial program using a 2.0 GHz Central Processing Unit (CPU). Real airbome gravity-gradiometry data from Vinton salt dome (south- west Louisiana, USA) are also considered. Good results are obtained, which verifies the efficiency and feasibility of the proposed parallel method in fast inversion of 3D FTG data.展开更多
Vector quantization (VQ) is an important data compression method. The key of the encoding of VQ is to find the closest vector among N vectors for a feature vector. Many classical linear search algorithms take O(N)...Vector quantization (VQ) is an important data compression method. The key of the encoding of VQ is to find the closest vector among N vectors for a feature vector. Many classical linear search algorithms take O(N) steps of distance computing between two vectors. The quantum VQ iteration and corresponding quantum VQ encoding algorithm that takes O(√N) steps are presented in this paper. The unitary operation of distance computing can be performed on a number of vectors simultaneously because the quantum state exists in a superposition of states. The quantum VQ iteration comprises three oracles, by contrast many quantum algorithms have only one oracle, such as Shor's factorization algorithm and Grover's algorithm. Entanglement state is generated and used, by contrast the state in Grover's algorithm is not an entanglement state. The quantum VQ iteration is a rotation over subspace, by contrast the Grover iteration is a rotation over global space. The quantum VQ iteration extends the Grover iteration to the more complex search that requires more oracles. The method of the quantum VQ iteration is universal.展开更多
Viscoelastic parameters are becoming more important and their inversion algorithms are studied by many researchers. Genetic algorithms are random, self-adaptive, robust, and heuristic with global search and convergenc...Viscoelastic parameters are becoming more important and their inversion algorithms are studied by many researchers. Genetic algorithms are random, self-adaptive, robust, and heuristic with global search and convergence abilities. Based on the direct VSP wave equation, a genetic algorithm (GA) is introduced to determine the viscoelastic parameters. First, the direct wave equation in frequency is expressed as a function of complex velocity and then the complex velocities estimated by GA inversion. Since the phase velocity and Q-factor both are functions of complex velocity, their values can be computed easily. However, there are so many complex velocities that it is difficult to invert them directly. They can be rewritten as a function of Co and C∞ to reduce the number of parameters during the inversion process. Finally, a theoretical model experiment proves that our algorithm is exact and effective.展开更多
As the cash register system gradually prevailed in shopping malls, detecting the abnormal status of the cash register system has gradually become a hotspot issue. This paper analyzes the transaction data of a shopping...As the cash register system gradually prevailed in shopping malls, detecting the abnormal status of the cash register system has gradually become a hotspot issue. This paper analyzes the transaction data of a shopping mall. When calculating the degree of data difference, the coefficient of variation is used as the attribute weight;the weighted Euclidean distance is used to calculate the degree of difference;and k-means clustering is used to classify different time periods. It applies the LOF algorithm to detect the outlier degree of transaction data at each time period, sets the initial threshold to detect outliers, deletes the outliers, and then performs SAX detection on the data set. If it does not pass the test, then it will gradually expand the outlying domain and repeat the above process to optimize the outlier threshold to improve the sensitivity of detection algorithm and reduce false positives.展开更多
To Meet the requirements of multi-sensor data fusion in diagnosis for complex equipment systems,a novel, fuzzy similarity-based data fusion algorithm is given. Based on fuzzy set theory, it calculates the fuzzy simila...To Meet the requirements of multi-sensor data fusion in diagnosis for complex equipment systems,a novel, fuzzy similarity-based data fusion algorithm is given. Based on fuzzy set theory, it calculates the fuzzy similarity among a certain sensor's measurement values and the multiple sensor's objective prediction values to determine the importance weigh of each sensor,and realizes the multi-sensor diagnosis parameter data fusion.According to the principle, its application software is also designed. The applied example proves that the algorithm can give priority to the high-stability and high -reliability sensors and it is laconic ,feasible and efficient to real-time circumstance measure and data processing in engine diagnosis.展开更多
This study concerns a Ka-band solid-state transmitter cloud radar, made in China, which can operate in three different work modes, with different pulse widths, and coherent and incoherent integration numbers, to meet ...This study concerns a Ka-band solid-state transmitter cloud radar, made in China, which can operate in three different work modes, with different pulse widths, and coherent and incoherent integration numbers, to meet the requirements for cloud remote sensing over the Tibetan Plateau. Specifically, the design of the three operational modes of the radar(i.e., boundary mode M1, cirrus mode M2, and precipitation mode M3) is introduced. Also, a cloud radar data merging algorithm for the three modes is proposed. Using one month's continuous measurements during summertime at Naqu on the Tibetan Plateau,we analyzed the consistency between the cloud radar measurements of the three modes. The number of occurrences of radar detections of hydrometeors and the percentage contributions of the different modes' data to the merged data were estimated.The performance of the merging algorithm was evaluated. The results indicated that the minimum detectable reflectivity for each mode was consistent with theoretical results. Merged data provided measurements with a minimum reflectivity of -35 dBZ at the height of 5 km, and obtained information above the height of 0.2 km. Measurements of radial velocity by the three operational modes agreed very well, and systematic errors in measurements of reflectivity were less than 2 dB. However,large discrepancies existed in the measurements of the linear depolarization ratio taken from the different operational modes.The percentage of radar detections of hydrometeors in mid- and high-level clouds increased by 60% through application of pulse compression techniques. In conclusion, the merged data are appropriate for cloud and precipitation studies over the Tibetan Plateau.展开更多
Workers’exposure to excessive noise is a big universal work-related challenges.One of the major consequences of exposure to noise is permanent or transient hearing loss.The current study sought to utilize audiometric...Workers’exposure to excessive noise is a big universal work-related challenges.One of the major consequences of exposure to noise is permanent or transient hearing loss.The current study sought to utilize audiometric data to weigh and prioritize the factors affecting workers’hearing loss based using the Support Vector Machine(SVM)algorithm.This cross sectional-descriptive study was conducted in 2017 in a mining industry in southeast Iran.The participating workers(n=150)were divided into three groups of 50 based on the sound pressure level to which they were exposed(two experimental groups and one control group).Audiometric tests were carried out for all members of each group.The study generally entailed the following steps:(1)selecting predicting variables to weigh and prioritize factors affecting hearing loss;(2)conducting audiometric tests and assessing permanent hearing loss in each ear and then evaluating total hearing loss;(3)categorizing different types of hearing loss;(4)weighing and prioritizing factors that affect hearing loss based on the SVM algorithm;and(5)assessing the error rate and accuracy of the models.The collected data were fed into SPSS 18,followed by conducting linear regression and paired samples t-test.It was revealed that,in the first model(SPL<70 dBA),the frequency of 8 KHz had the greatest impact(with a weight of 33%),while noise had the smallest influence(with a weight of 5%).The accuracy of this model was 100%.In the second model(70<SPL<80 dBA),the frequency of 4 KHz had the most profound effect(with a weight of 21%),whereas the frequency of 250 Hz had the lowest impact(with a weight of 6%).The accuracy of this model was 100%too.In the third model(SPL>85 dBA),the frequency of 4 KHz had the highest impact(with a weight of 22%),while the frequency of 250 Hz had the smallest influence(with a weight of 3%).The accuracy of this model was 100%too.In the fourth model,the frequency of 4 KHz had the greatest effect(with a weight of 24%),while the frequency of 500 Hz had the smallest effect(with a weight of 4%).The accuracy of this model was found to be 94%.According to the modeling conducted using the SVM algorithm,the frequency of 4 KHz has the most profound effect on predicting changes in hearing loss.Given the high accuracy of the obtained model,this algorithm is an appropriate and powerful tool to predict and model hearing loss.展开更多
Under the scenario of dense targets in clutter, a multi-layer optimal data correlation algorithm is proposed. This algorithm eliminates a large number of false location points from the assignment process by rough corr...Under the scenario of dense targets in clutter, a multi-layer optimal data correlation algorithm is proposed. This algorithm eliminates a large number of false location points from the assignment process by rough correlations before we calculate the correlation cost, so it avoids the operations for the target state estimate and the calculation of the correlation cost for the false correlation sets. In the meantime, with the elimination of these points in the rough correlation, the disturbance from the false correlations in the assignment process is decreased, so the data correlation accuracy is improved correspondingly. Complexity analyses of the new multi-layer optimal algorithm and the traditional optimal assignment algorithm are given. Simulation results show that the new algorithm is feasible and effective.展开更多
基金Foundation of National Natural Science Foundation of China(62202118)Scientific and Technological Research Projects from Guizhou Education Department([2023]003)+1 种基金Guizhou Provincial Department of Science and Technology Hundred Levels of Innovative Talents Project(GCC[2023]018)Top Technology Talent Project from Guizhou Education Department([2022]073).
文摘The development of technologies such as big data and blockchain has brought convenience to life,but at the same time,privacy and security issues are becoming more and more prominent.The K-anonymity algorithm is an effective and low computational complexity privacy-preserving algorithm that can safeguard users’privacy by anonymizing big data.However,the algorithm currently suffers from the problem of focusing only on improving user privacy while ignoring data availability.In addition,ignoring the impact of quasi-identified attributes on sensitive attributes causes the usability of the processed data on statistical analysis to be reduced.Based on this,we propose a new K-anonymity algorithm to solve the privacy security problem in the context of big data,while guaranteeing improved data usability.Specifically,we construct a new information loss function based on the information quantity theory.Considering that different quasi-identification attributes have different impacts on sensitive attributes,we set weights for each quasi-identification attribute when designing the information loss function.In addition,to reduce information loss,we improve K-anonymity in two ways.First,we make the loss of information smaller than in the original table while guaranteeing privacy based on common artificial intelligence algorithms,i.e.,greedy algorithm and 2-means clustering algorithm.In addition,we improve the 2-means clustering algorithm by designing a mean-center method to select the initial center of mass.Meanwhile,we design the K-anonymity algorithm of this scheme based on the constructed information loss function,the improved 2-means clustering algorithm,and the greedy algorithm,which reduces the information loss.Finally,we experimentally demonstrate the effectiveness of the algorithm in improving the effect of 2-means clustering and reducing information loss.
基金supported by the National Natural Science Foundation of China(Project No.51767018)Natural Science Foundation of Gansu Province(Project No.23JRRA836).
文摘Current methodologies for cleaning wind power anomaly data exhibit limited capabilities in identifying abnormal data within extensive datasets and struggle to accommodate the considerable variability and intricacy of wind farm data.Consequently,a method for cleaning wind power anomaly data by combining image processing with community detection algorithms(CWPAD-IPCDA)is proposed.To precisely identify and initially clean anomalous data,wind power curve(WPC)images are converted into graph structures,which employ the Louvain community recognition algorithm and graph-theoretic methods for community detection and segmentation.Furthermore,the mathematical morphology operation(MMO)determines the main part of the initially cleaned wind power curve images and maps them back to the normal wind power points to complete the final cleaning.The CWPAD-IPCDA method was applied to clean datasets from 25 wind turbines(WTs)in two wind farms in northwest China to validate its feasibility.A comparison was conducted using density-based spatial clustering of applications with noise(DBSCAN)algorithm,an improved isolation forest algorithm,and an image-based(IB)algorithm.The experimental results demonstrate that the CWPAD-IPCDA method surpasses the other three algorithms,achieving an approximately 7.23%higher average data cleaning rate.The mean value of the sum of the squared errors(SSE)of the dataset after cleaning is approximately 6.887 lower than that of the other algorithms.Moreover,the mean of overall accuracy,as measured by the F1-score,exceeds that of the other methods by approximately 10.49%;this indicates that the CWPAD-IPCDA method is more conducive to improving the accuracy and reliability of wind power curve modeling and wind farm power forecasting.
文摘The Chang'e-3 (CE-3) mission is China's first exploration mission on the surface of the Moon that uses a lander and a rover. Eight instruments that form the scientific payloads have the following objectives: (1) investigate the morphological features and geological structures at the landing site; (2) integrated in-situ analysis of minerals and chemical compositions; (3) integrated exploration of the structure of the lunar interior; (4) exploration of the lunar-terrestrial space environment, lunar sur- face environment and acquire Moon-based ultraviolet astronomical observations. The Ground Research and Application System (GRAS) is in charge of data acquisition and pre-processing, management of the payload in orbit, and managing the data products and their applications. The Data Pre-processing Subsystem (DPS) is a part of GRAS. The task of DPS is the pre-processing of raw data from the eight instruments that are part of CE-3, including channel processing, unpacking, package sorting, calibration and correction, identification of geographical location, calculation of probe azimuth angle, probe zenith angle, solar azimuth angle, and solar zenith angle and so on, and conducting quality checks. These processes produce Level 0, Level 1 and Level 2 data. The computing platform of this subsystem is comprised of a high-performance computing cluster, including a real-time subsystem used for processing Level 0 data and a post-time subsystem for generating Level 1 and Level 2 data. This paper de- scribes the CE-3 data pre-processing method, the data pre-processing subsystem, data classification, data validity and data products that are used for scientific studies.
文摘Compositional data, such as relative information, is a crucial aspect of machine learning and other related fields. It is typically recorded as closed data or sums to a constant, like 100%. The statistical linear model is the most used technique for identifying hidden relationships between underlying random variables of interest. However, data quality is a significant challenge in machine learning, especially when missing data is present. The linear regression model is a commonly used statistical modeling technique used in various applications to find relationships between variables of interest. When estimating linear regression parameters which are useful for things like future prediction and partial effects analysis of independent variables, maximum likelihood estimation (MLE) is the method of choice. However, many datasets contain missing observations, which can lead to costly and time-consuming data recovery. To address this issue, the expectation-maximization (EM) algorithm has been suggested as a solution for situations including missing data. The EM algorithm repeatedly finds the best estimates of parameters in statistical models that depend on variables or data that have not been observed. This is called maximum likelihood or maximum a posteriori (MAP). Using the present estimate as input, the expectation (E) step constructs a log-likelihood function. Finding the parameters that maximize the anticipated log-likelihood, as determined in the E step, is the job of the maximization (M) phase. This study looked at how well the EM algorithm worked on a made-up compositional dataset with missing observations. It used both the robust least square version and ordinary least square regression techniques. The efficacy of the EM algorithm was compared with two alternative imputation techniques, k-Nearest Neighbor (k-NN) and mean imputation (), in terms of Aitchison distances and covariance.
基金Key Science and Technology Project of the Shanghai Committee of Science and Technology, China (No.06dz1200921)Major Basic Research Project of the Shanghai Committee of Science and Technology(No.08JC1400100)+1 种基金Shanghai Talent Developing Foundation, China(No.001)Specialized Foundation for Excellent Talent of Shanghai,China
文摘There are a number of dirty data in observation data set derived from integrated ocean observing network system. Thus, the data must be carefully and reasonably processed before they are used for forecasting or analysis. This paper proposes a data pre-processing model based on intelligent algorithms. Firstly, we introduce the integrated network platform of ocean observation. Next, the preprocessing model of data is presemed, and an imelligent cleaning model of data is proposed. Based on fuzzy clustering, the Kohonen clustering network is improved to fulfill the parallel calculation of fuzzy c-means clustering. The proposed dynamic algorithm can automatically f'md the new clustering center with the updated sample data. The rapid and dynamic performance of the model makes it suitable for real time calculation, and the efficiency and accuracy of the model is proved by test results through observation data analysis.
文摘The solution of linear equation group can be applied to the oil exploration, the structure vibration analysis, the computational fluid dynamics, and other fields. When we make the in-depth analysis of some large or very large complicated structures, we must use the parallel algorithm with the aid of high-performance computers to solve complex problems. This paper introduces the implementation process having the parallel with sparse linear equations from the perspective of sparse linear equation group.
文摘The satellite laser ranging (SLR) data quality from the COMPASS was analyzed, and the difference between curve recognition in computer vision and pre-process of SLR data finally proposed a new algorithm for SLR was discussed data based on curve recognition from points cloud is proposed. The results obtained by the new algorithm are 85 % (or even higher) consistent with that of the screen displaying method, furthermore, the new method can process SLR data automatically, which makes it possible to be used in the development of the COMPASS navigation system.
文摘To improve the performance of the traditional map matching algorithms in freeway traffic state monitoring systems using the low logging frequency GPS (global positioning system) probe data, a map matching algorithm based on the Oracle spatial data model is proposed. The algorithm uses the Oracle road network data model to analyze the spatial relationships between massive GPS positioning points and freeway networks, builds an N-shortest path algorithm to find reasonable candidate routes between GPS positioning points efficiently, and uses the fuzzy logic inference system to determine the final matched traveling route. According to the implementation with field data from Los Angeles, the computation speed of the algorithm is about 135 GPS positioning points per second and the accuracy is 98.9%. The results demonstrate the effectiveness and accuracy of the proposed algorithm for mapping massive GPS positioning data onto freeway networks with complex geometric characteristics.
基金Supported by the Open Researches Fund Program of L IESMARS(WKL(0 0 ) 0 30 2 )
文摘Clustering, in data mining, is a useful technique for discovering interesting data distributions and patterns in the underlying data, and has many application fields, such as statistical data analysis, pattern recognition, image processing, and etc. We combine sampling technique with DBSCAN algorithm to cluster large spatial databases, and two sampling based DBSCAN (SDBSCAN) algorithms are developed. One algorithm introduces sampling technique inside DBSCAN, and the other uses sampling procedure outside DBSCAN. Experimental results demonstrate that our algorithms are effective and efficient in clustering large scale spatial databases.
基金This project was supported by the National Natural Science Foundation of China (60272024).
文摘A specialized Hungarian algorithm was developed here for the maximum likelihood data association problem with two implementation versions due to presence of false alarms and missed detections. The maximum likelihood data association problem is formulated as a bipartite weighted matching problem. Its duality and the optimality conditions are given. The Hungarian algorithm with its computational steps, data structure and computational complexity is presented. The two implementation versions, Hungarian forest (HF) algorithm and Hungarian tree (HT) algorithm, and their combination with the naYve auction initialization are discussed. The computational results show that HT algorithm is slightly faster than HF algorithm and they are both superior to the classic Munkres algorithm.
基金The National Natural Science Foundation of China under contract Nos 41330960 and 41276193 and 41206184
文摘In recent years, the rapid decline of Arctic sea ice area (SIA) and sea ice extent (SIE), especially for the multiyear (MY) ice, has led to significant effect on climate change. The accurate retrieval of MY ice concentration retrieval is very important and challenging to understand the ongoing changes. Three MY ice concentration retrieval algorithms were systematically evaluated. A similar total ice concentration was yielded by these algorithms, while the retrieved MY sea ice concentrations differs from each other. The MY SIA derived from NASA TEAM algorithm is relatively stable. Other two algorithms created seasonal fluctuations of MY SIA, particularly in autumn and winter. In this paper, we proposed an ice concentration retrieval algorithm, which developed the NASA TEAM algorithm by adding to use AMSR-E 6.9 GHz brightness temperature data and sea ice concentration using 89.0 GHz data. Comparison with the reference MY SIA from reference MY ice, indicates that the mean difference and root mean square (rms) difference of MY SIA derived from the algorithm of this study are 0.65×10^6 km^2 and 0.69×10^6 km^2 during January to March, -0.06×10^6 km^2 and 0.14×10^6 km^2during September to December respectively. Comparison with MY SIE obtained from weekly ice age data provided by University of Colorado show that, the mean difference and rms difference are 0.69×10^6 km^2 and 0.84×10^6 km^2, respectively. The developed algorithm proposed in this study has smaller difference compared with the reference MY ice and MY SIE from ice age data than the Wang's, Lomax' and NASA TEAM algorithms.
基金This project was supported by the National Natural Science Foundation of China (60172033) the Excellent Ph.D.PaperAuthor Foundation of China (200036 ,200237) .
文摘Aiming at three-passive-sensor location system, a generalized 3-dimension (3-D) assignment model is constructed based on property information, and a multi-target programming model is proposed based on direction-finding and property fusion information. The multi-target programming model is transformed into a single target programming problem to resolve, and its data association result is compared with the results which are solved by using one kind of information only. Simulation experiments show the effectiveness of the multi-target programming algorithm with higher data association accuracy and less calculation.
基金the Sub-project of National Science and Technology Major Project of China(No.2016ZX05027-002-003)the National Natural Science Foundation of China(No.41404089)+1 种基金the State Key Program of National Natural Science of China(No.41430322)the National Basic Research Program of China(973 Program)(No.2015CB45300)
文摘With the continuous development of full tensor gradiometer (FTG) measurement techniques, three-dimensional (3D) inversion of FTG data is becoming increasingly used in oil and gas exploration. In the fast processing and interpretation of large-scale high-precision data, the use of the graphics processing unit process unit (GPU) and preconditioning methods are very important in the data inversion. In this paper, an improved preconditioned conjugate gradient algorithm is proposed by combining the symmetric successive over-relaxation (SSOR) technique and the incomplete Choleksy decomposition conjugate gradient algorithm (ICCG). Since preparing the preconditioner requires extra time, a parallel implement based on GPU is proposed. The improved method is then applied in the inversion of noise- contaminated synthetic data to prove its adaptability in the inversion of 3D FTG data. Results show that the parallel SSOR-ICCG algorithm based on NVIDIA Tesla C2050 GPU achieves a speedup of approximately 25 times that of a serial program using a 2.0 GHz Central Processing Unit (CPU). Real airbome gravity-gradiometry data from Vinton salt dome (south- west Louisiana, USA) are also considered. Good results are obtained, which verifies the efficiency and feasibility of the proposed parallel method in fast inversion of 3D FTG data.
文摘Vector quantization (VQ) is an important data compression method. The key of the encoding of VQ is to find the closest vector among N vectors for a feature vector. Many classical linear search algorithms take O(N) steps of distance computing between two vectors. The quantum VQ iteration and corresponding quantum VQ encoding algorithm that takes O(√N) steps are presented in this paper. The unitary operation of distance computing can be performed on a number of vectors simultaneously because the quantum state exists in a superposition of states. The quantum VQ iteration comprises three oracles, by contrast many quantum algorithms have only one oracle, such as Shor's factorization algorithm and Grover's algorithm. Entanglement state is generated and used, by contrast the state in Grover's algorithm is not an entanglement state. The quantum VQ iteration is a rotation over subspace, by contrast the Grover iteration is a rotation over global space. The quantum VQ iteration extends the Grover iteration to the more complex search that requires more oracles. The method of the quantum VQ iteration is universal.
文摘Viscoelastic parameters are becoming more important and their inversion algorithms are studied by many researchers. Genetic algorithms are random, self-adaptive, robust, and heuristic with global search and convergence abilities. Based on the direct VSP wave equation, a genetic algorithm (GA) is introduced to determine the viscoelastic parameters. First, the direct wave equation in frequency is expressed as a function of complex velocity and then the complex velocities estimated by GA inversion. Since the phase velocity and Q-factor both are functions of complex velocity, their values can be computed easily. However, there are so many complex velocities that it is difficult to invert them directly. They can be rewritten as a function of Co and C∞ to reduce the number of parameters during the inversion process. Finally, a theoretical model experiment proves that our algorithm is exact and effective.
文摘As the cash register system gradually prevailed in shopping malls, detecting the abnormal status of the cash register system has gradually become a hotspot issue. This paper analyzes the transaction data of a shopping mall. When calculating the degree of data difference, the coefficient of variation is used as the attribute weight;the weighted Euclidean distance is used to calculate the degree of difference;and k-means clustering is used to classify different time periods. It applies the LOF algorithm to detect the outlier degree of transaction data at each time period, sets the initial threshold to detect outliers, deletes the outliers, and then performs SAX detection on the data set. If it does not pass the test, then it will gradually expand the outlying domain and repeat the above process to optimize the outlier threshold to improve the sensitivity of detection algorithm and reduce false positives.
文摘To Meet the requirements of multi-sensor data fusion in diagnosis for complex equipment systems,a novel, fuzzy similarity-based data fusion algorithm is given. Based on fuzzy set theory, it calculates the fuzzy similarity among a certain sensor's measurement values and the multiple sensor's objective prediction values to determine the importance weigh of each sensor,and realizes the multi-sensor diagnosis parameter data fusion.According to the principle, its application software is also designed. The applied example proves that the algorithm can give priority to the high-stability and high -reliability sensors and it is laconic ,feasible and efficient to real-time circumstance measure and data processing in engine diagnosis.
基金funded by the National Sciences Foundation of China(Grant No.91337103)the China Meteorological Administration Special Public Welfare Research Fund(Grant No.GYHY201406001)
文摘This study concerns a Ka-band solid-state transmitter cloud radar, made in China, which can operate in three different work modes, with different pulse widths, and coherent and incoherent integration numbers, to meet the requirements for cloud remote sensing over the Tibetan Plateau. Specifically, the design of the three operational modes of the radar(i.e., boundary mode M1, cirrus mode M2, and precipitation mode M3) is introduced. Also, a cloud radar data merging algorithm for the three modes is proposed. Using one month's continuous measurements during summertime at Naqu on the Tibetan Plateau,we analyzed the consistency between the cloud radar measurements of the three modes. The number of occurrences of radar detections of hydrometeors and the percentage contributions of the different modes' data to the merged data were estimated.The performance of the merging algorithm was evaluated. The results indicated that the minimum detectable reflectivity for each mode was consistent with theoretical results. Merged data provided measurements with a minimum reflectivity of -35 dBZ at the height of 5 km, and obtained information above the height of 0.2 km. Measurements of radial velocity by the three operational modes agreed very well, and systematic errors in measurements of reflectivity were less than 2 dB. However,large discrepancies existed in the measurements of the linear depolarization ratio taken from the different operational modes.The percentage of radar detections of hydrometeors in mid- and high-level clouds increased by 60% through application of pulse compression techniques. In conclusion, the merged data are appropriate for cloud and precipitation studies over the Tibetan Plateau.
基金This study stemmed from a research project(code number:96000838)which was sponsored by the Institute for Futures Studies in Health at Kerman University of Medical Sciences.
文摘Workers’exposure to excessive noise is a big universal work-related challenges.One of the major consequences of exposure to noise is permanent or transient hearing loss.The current study sought to utilize audiometric data to weigh and prioritize the factors affecting workers’hearing loss based using the Support Vector Machine(SVM)algorithm.This cross sectional-descriptive study was conducted in 2017 in a mining industry in southeast Iran.The participating workers(n=150)were divided into three groups of 50 based on the sound pressure level to which they were exposed(two experimental groups and one control group).Audiometric tests were carried out for all members of each group.The study generally entailed the following steps:(1)selecting predicting variables to weigh and prioritize factors affecting hearing loss;(2)conducting audiometric tests and assessing permanent hearing loss in each ear and then evaluating total hearing loss;(3)categorizing different types of hearing loss;(4)weighing and prioritizing factors that affect hearing loss based on the SVM algorithm;and(5)assessing the error rate and accuracy of the models.The collected data were fed into SPSS 18,followed by conducting linear regression and paired samples t-test.It was revealed that,in the first model(SPL<70 dBA),the frequency of 8 KHz had the greatest impact(with a weight of 33%),while noise had the smallest influence(with a weight of 5%).The accuracy of this model was 100%.In the second model(70<SPL<80 dBA),the frequency of 4 KHz had the most profound effect(with a weight of 21%),whereas the frequency of 250 Hz had the lowest impact(with a weight of 6%).The accuracy of this model was 100%too.In the third model(SPL>85 dBA),the frequency of 4 KHz had the highest impact(with a weight of 22%),while the frequency of 250 Hz had the smallest influence(with a weight of 3%).The accuracy of this model was 100%too.In the fourth model,the frequency of 4 KHz had the greatest effect(with a weight of 24%),while the frequency of 500 Hz had the smallest effect(with a weight of 4%).The accuracy of this model was found to be 94%.According to the modeling conducted using the SVM algorithm,the frequency of 4 KHz has the most profound effect on predicting changes in hearing loss.Given the high accuracy of the obtained model,this algorithm is an appropriate and powerful tool to predict and model hearing loss.
基金This project was supported by the National Natural Science Foundation of China (60672139, 60672140)the Excellent Ph.D. Paper Author Foundation of China (200237)the Natural Science Foundation of Shandong (2005ZX01).
文摘Under the scenario of dense targets in clutter, a multi-layer optimal data correlation algorithm is proposed. This algorithm eliminates a large number of false location points from the assignment process by rough correlations before we calculate the correlation cost, so it avoids the operations for the target state estimate and the calculation of the correlation cost for the false correlation sets. In the meantime, with the elimination of these points in the rough correlation, the disturbance from the false correlations in the assignment process is decreased, so the data correlation accuracy is improved correspondingly. Complexity analyses of the new multi-layer optimal algorithm and the traditional optimal assignment algorithm are given. Simulation results show that the new algorithm is feasible and effective.