Current methodologies for cleaning wind power anomaly data exhibit limited capabilities in identifying abnormal data within extensive datasets and struggle to accommodate the considerable variability and intricacy of ...Current methodologies for cleaning wind power anomaly data exhibit limited capabilities in identifying abnormal data within extensive datasets and struggle to accommodate the considerable variability and intricacy of wind farm data.Consequently,a method for cleaning wind power anomaly data by combining image processing with community detection algorithms(CWPAD-IPCDA)is proposed.To precisely identify and initially clean anomalous data,wind power curve(WPC)images are converted into graph structures,which employ the Louvain community recognition algorithm and graph-theoretic methods for community detection and segmentation.Furthermore,the mathematical morphology operation(MMO)determines the main part of the initially cleaned wind power curve images and maps them back to the normal wind power points to complete the final cleaning.The CWPAD-IPCDA method was applied to clean datasets from 25 wind turbines(WTs)in two wind farms in northwest China to validate its feasibility.A comparison was conducted using density-based spatial clustering of applications with noise(DBSCAN)algorithm,an improved isolation forest algorithm,and an image-based(IB)algorithm.The experimental results demonstrate that the CWPAD-IPCDA method surpasses the other three algorithms,achieving an approximately 7.23%higher average data cleaning rate.The mean value of the sum of the squared errors(SSE)of the dataset after cleaning is approximately 6.887 lower than that of the other algorithms.Moreover,the mean of overall accuracy,as measured by the F1-score,exceeds that of the other methods by approximately 10.49%;this indicates that the CWPAD-IPCDA method is more conducive to improving the accuracy and reliability of wind power curve modeling and wind farm power forecasting.展开更多
The development of technologies such as big data and blockchain has brought convenience to life,but at the same time,privacy and security issues are becoming more and more prominent.The K-anonymity algorithm is an eff...The development of technologies such as big data and blockchain has brought convenience to life,but at the same time,privacy and security issues are becoming more and more prominent.The K-anonymity algorithm is an effective and low computational complexity privacy-preserving algorithm that can safeguard users’privacy by anonymizing big data.However,the algorithm currently suffers from the problem of focusing only on improving user privacy while ignoring data availability.In addition,ignoring the impact of quasi-identified attributes on sensitive attributes causes the usability of the processed data on statistical analysis to be reduced.Based on this,we propose a new K-anonymity algorithm to solve the privacy security problem in the context of big data,while guaranteeing improved data usability.Specifically,we construct a new information loss function based on the information quantity theory.Considering that different quasi-identification attributes have different impacts on sensitive attributes,we set weights for each quasi-identification attribute when designing the information loss function.In addition,to reduce information loss,we improve K-anonymity in two ways.First,we make the loss of information smaller than in the original table while guaranteeing privacy based on common artificial intelligence algorithms,i.e.,greedy algorithm and 2-means clustering algorithm.In addition,we improve the 2-means clustering algorithm by designing a mean-center method to select the initial center of mass.Meanwhile,we design the K-anonymity algorithm of this scheme based on the constructed information loss function,the improved 2-means clustering algorithm,and the greedy algorithm,which reduces the information loss.Finally,we experimentally demonstrate the effectiveness of the algorithm in improving the effect of 2-means clustering and reducing information loss.展开更多
Compositional data, such as relative information, is a crucial aspect of machine learning and other related fields. It is typically recorded as closed data or sums to a constant, like 100%. The statistical linear mode...Compositional data, such as relative information, is a crucial aspect of machine learning and other related fields. It is typically recorded as closed data or sums to a constant, like 100%. The statistical linear model is the most used technique for identifying hidden relationships between underlying random variables of interest. However, data quality is a significant challenge in machine learning, especially when missing data is present. The linear regression model is a commonly used statistical modeling technique used in various applications to find relationships between variables of interest. When estimating linear regression parameters which are useful for things like future prediction and partial effects analysis of independent variables, maximum likelihood estimation (MLE) is the method of choice. However, many datasets contain missing observations, which can lead to costly and time-consuming data recovery. To address this issue, the expectation-maximization (EM) algorithm has been suggested as a solution for situations including missing data. The EM algorithm repeatedly finds the best estimates of parameters in statistical models that depend on variables or data that have not been observed. This is called maximum likelihood or maximum a posteriori (MAP). Using the present estimate as input, the expectation (E) step constructs a log-likelihood function. Finding the parameters that maximize the anticipated log-likelihood, as determined in the E step, is the job of the maximization (M) phase. This study looked at how well the EM algorithm worked on a made-up compositional dataset with missing observations. It used both the robust least square version and ordinary least square regression techniques. The efficacy of the EM algorithm was compared with two alternative imputation techniques, k-Nearest Neighbor (k-NN) and mean imputation (), in terms of Aitchison distances and covariance.展开更多
Many search-based algorithms have been successfully applied in sev-eral software engineering activities.Genetic algorithms(GAs)are the most used in the scientific domains by scholars to solve software testing problems....Many search-based algorithms have been successfully applied in sev-eral software engineering activities.Genetic algorithms(GAs)are the most used in the scientific domains by scholars to solve software testing problems.They imi-tate the theory of natural selection and evolution.The harmony search algorithm(HSA)is one of the most recent search algorithms in the last years.It imitates the behavior of a musician tofind the best harmony.Scholars have estimated the simi-larities and the differences between genetic algorithms and the harmony search algorithm in diverse research domains.The test data generation process represents a critical task in software validation.Unfortunately,there is no work comparing the performance of genetic algorithms and the harmony search algorithm in the test data generation process.This paper studies the similarities and the differences between genetic algorithms and the harmony search algorithm based on the ability and speed offinding the required test data.The current research performs an empirical comparison of the HSA and the GAs,and then the significance of the results is estimated using the t-Test.The study investigates the efficiency of the harmony search algorithm and the genetic algorithms according to(1)the time performance,(2)the significance of the generated test data,and(3)the adequacy of the generated test data to satisfy a given testing criterion.The results showed that the harmony search algorithm is significantly faster than the genetic algo-rithms because the t-Test showed that the p-value of the time values is 0.026<α(αis the significance level=0.05 at 95%confidence level).In contrast,there is no significant difference between the two algorithms in generating the adequate test data because the t-Test showed that the p-value of thefitness values is 0.25>α.展开更多
With the rapid development of the Internet globally since the 21st century,the amount of data information has increased exponentially.Data helps improve people’s livelihood and working conditions,as well as learning ...With the rapid development of the Internet globally since the 21st century,the amount of data information has increased exponentially.Data helps improve people’s livelihood and working conditions,as well as learning efficiency.Therefore,data extraction,analysis,and processing have become a hot issue for people from all walks of life.Traditional recommendation algorithm still has some problems,such as inaccuracy,less diversity,and low performance.To solve these problems and improve the accuracy and variety of the recommendation algorithms,the research combines the convolutional neural networks(CNN)and the attention model to design a recommendation algorithm based on the neural network framework.Through the text convolutional network,the input layer in CNN has transformed into two channels:static ones and non-static ones.Meanwhile,the self-attention system focuses on the system so that data can be better processed and the accuracy of feature extraction becomes higher.The recommendation algorithm combines CNN and attention system and divides the embedding layer into user information feature embedding and data name feature extraction embedding.It obtains data name features through a convolution kernel.Finally,the top pooling layer obtains the length vector.The attention system layer obtains the characteristics of the data type.Experimental results show that the proposed recommendation algorithm that combines CNN and the attention system can perform better in data extraction than the traditional CNN algorithm and other recommendation algorithms that are popular at the present stage.The proposed algorithm shows excellent accuracy and robustness.展开更多
In recent years, the rapid decline of Arctic sea ice area (SIA) and sea ice extent (SIE), especially for the multiyear (MY) ice, has led to significant effect on climate change. The accurate retrieval of MY ice ...In recent years, the rapid decline of Arctic sea ice area (SIA) and sea ice extent (SIE), especially for the multiyear (MY) ice, has led to significant effect on climate change. The accurate retrieval of MY ice concentration retrieval is very important and challenging to understand the ongoing changes. Three MY ice concentration retrieval algorithms were systematically evaluated. A similar total ice concentration was yielded by these algorithms, while the retrieved MY sea ice concentrations differs from each other. The MY SIA derived from NASA TEAM algorithm is relatively stable. Other two algorithms created seasonal fluctuations of MY SIA, particularly in autumn and winter. In this paper, we proposed an ice concentration retrieval algorithm, which developed the NASA TEAM algorithm by adding to use AMSR-E 6.9 GHz brightness temperature data and sea ice concentration using 89.0 GHz data. Comparison with the reference MY SIA from reference MY ice, indicates that the mean difference and root mean square (rms) difference of MY SIA derived from the algorithm of this study are 0.65×10^6 km^2 and 0.69×10^6 km^2 during January to March, -0.06×10^6 km^2 and 0.14×10^6 km^2during September to December respectively. Comparison with MY SIE obtained from weekly ice age data provided by University of Colorado show that, the mean difference and rms difference are 0.69×10^6 km^2 and 0.84×10^6 km^2, respectively. The developed algorithm proposed in this study has smaller difference compared with the reference MY ice and MY SIE from ice age data than the Wang's, Lomax' and NASA TEAM algorithms.展开更多
When building a classification model,the scenario where the samples of one class are significantly more than those of the other class is called data imbalance.Data imbalance causes the trained classification model to ...When building a classification model,the scenario where the samples of one class are significantly more than those of the other class is called data imbalance.Data imbalance causes the trained classification model to be in favor of the majority class(usually defined as the negative class),which may do harm to the accuracy of the minority class(usually defined as the positive class),and then lead to poor overall performance of the model.A method called MSHR-FCSSVM for solving imbalanced data classification is proposed in this article,which is based on a new hybrid resampling approach(MSHR)and a new fine cost-sensitive support vector machine(CS-SVM)classifier(FCSSVM).The MSHR measures the separability of each negative sample through its Silhouette value calculated by Mahalanobis distance between samples,based on which,the so-called pseudo-negative samples are screened out to generate new positive samples(over-sampling step)through linear interpolation and are deleted finally(under-sampling step).This approach replaces pseudo-negative samples with generated new positive samples one by one to clear up the inter-class overlap on the borderline,without changing the overall scale of the dataset.The FCSSVM is an improved version of the traditional CS-SVM.It considers influences of both the imbalance of sample number and the class distribution on classification simultaneously,and through finely tuning the class cost weights by using the efficient optimization algorithm based on the physical phenomenon of rime-ice(RIME)algorithm with cross-validation accuracy as the fitness function to accurately adjust the classification borderline.To verify the effectiveness of the proposed method,a series of experiments are carried out based on 20 imbalanced datasets including both mildly and extremely imbalanced datasets.The experimental results show that the MSHR-FCSSVM method performs better than the methods for comparison in most cases,and both the MSHR and the FCSSVM played significant roles.展开更多
With the development of information technology,a large number of product quality data in the entire manufacturing process is accumulated,but it is not explored and used effectively.The traditional product quality pred...With the development of information technology,a large number of product quality data in the entire manufacturing process is accumulated,but it is not explored and used effectively.The traditional product quality prediction models have many disadvantages,such as high complexity and low accuracy.To overcome the above problems,we propose an optimized data equalization method to pre-process dataset and design a simple but effective product quality prediction model:radial basis function model optimized by the firefly algorithm with Levy flight mechanism(RBFFALM).First,the new data equalization method is introduced to pre-process the dataset,which reduces the dimension of the data,removes redundant features,and improves the data distribution.Then the RBFFALFM is used to predict product quality.Comprehensive expe riments conducted on real-world product quality datasets validate that the new model RBFFALFM combining with the new data pre-processing method outperforms other previous me thods on predicting product quality.展开更多
The paper addresses the challenge of transmitting a big number offiles stored in a data center(DC),encrypting them by compilers,and sending them through a network at an acceptable time.Face to the big number offiles,o...The paper addresses the challenge of transmitting a big number offiles stored in a data center(DC),encrypting them by compilers,and sending them through a network at an acceptable time.Face to the big number offiles,only one compiler may not be sufficient to encrypt data in an acceptable time.In this paper,we consider the problem of several compilers and the objective is tofind an algorithm that can give an efficient schedule for the givenfiles to be compiled by the compilers.The main objective of the work is to minimize the gap in the total size of assignedfiles between compilers.This minimization ensures the fair distribution offiles to different compilers.This problem is considered to be a very hard problem.This paper presents two research axes.Thefirst axis is related to architecture.We propose a novel pre-compiler architecture in this context.The second axis is algorithmic development.We develop six algorithms to solve the problem,in this context.These algorithms are based on the dispatching rules method,decomposition method,and an iterative approach.These algorithms give approximate solutions for the studied problem.An experimental result is imple-mented to show the performance of algorithms.Several indicators are used to measure the performance of the proposed algorithms.In addition,five classes are proposed to test the algorithms with a total of 2350 instances.A comparison between the proposed algorithms is presented in different tables discussed to show the performance of each algorithm.The result showed that the best algorithm is the Iterative-mixed Smallest-Longest-Heuristic(ISL)with a percentage equal to 97.7%and an average running time equal to 0.148 s.All other algorithms did not exceed 22%as a percentage.The best algorithm excluding ISL is Iterative-mixed Longest-Smallest Heuristic(ILS)with a percentage equal to 21,4%and an average running time equal to 0.150 s.展开更多
Datamining plays a crucial role in extractingmeaningful knowledge fromlarge-scale data repositories,such as data warehouses and databases.Association rule mining,a fundamental process in data mining,involves discoveri...Datamining plays a crucial role in extractingmeaningful knowledge fromlarge-scale data repositories,such as data warehouses and databases.Association rule mining,a fundamental process in data mining,involves discovering correlations,patterns,and causal structures within datasets.In the healthcare domain,association rules offer valuable opportunities for building knowledge bases,enabling intelligent diagnoses,and extracting invaluable information rapidly.This paper presents a novel approach called the Machine Learning based Association Rule Mining and Classification for Healthcare Data Management System(MLARMC-HDMS).The MLARMC-HDMS technique integrates classification and association rule mining(ARM)processes.Initially,the chimp optimization algorithm-based feature selection(COAFS)technique is employed within MLARMC-HDMS to select relevant attributes.Inspired by the foraging behavior of chimpanzees,the COA algorithm mimics their search strategy for food.Subsequently,the classification process utilizes stochastic gradient descent with a multilayer perceptron(SGD-MLP)model,while the Apriori algorithm determines attribute relationships.We propose a COA-based feature selection approach for medical data classification using machine learning techniques.This approach involves selecting pertinent features from medical datasets through COA and training machine learning models using the reduced feature set.We evaluate the performance of our approach on various medical datasets employing diverse machine learning classifiers.Experimental results demonstrate that our proposed approach surpasses alternative feature selection methods,achieving higher accuracy and precision rates in medical data classification tasks.The study showcases the effectiveness and efficiency of the COA-based feature selection approach in identifying relevant features,thereby enhancing the diagnosis and treatment of various diseases.To provide further validation,we conduct detailed experiments on a benchmark medical dataset,revealing the superiority of the MLARMCHDMS model over other methods,with a maximum accuracy of 99.75%.Therefore,this research contributes to the advancement of feature selection techniques in medical data classification and highlights the potential for improving healthcare outcomes through accurate and efficient data analysis.The presented MLARMC-HDMS framework and COA-based feature selection approach offer valuable insights for researchers and practitioners working in the field of healthcare data mining and machine learning.展开更多
Improved traditional ant colony algorithms,a data routing model used to the data remote exchange on WAN was presented.In the model,random heuristic factors were introduced to realize multi-path search.The updating mod...Improved traditional ant colony algorithms,a data routing model used to the data remote exchange on WAN was presented.In the model,random heuristic factors were introduced to realize multi-path search.The updating model of pheromone could adjust the pheromone concentration on the optimal path according to path load dynamically to make the system keep load balance.The simulation results show that the improved model has a higher performance on convergence and load balance.展开更多
Although the Internet of Things has been widely applied,the problems of cloud computing in the application of digital smart medical Big Data collection,processing,analysis,and storage remain,especially the low efficie...Although the Internet of Things has been widely applied,the problems of cloud computing in the application of digital smart medical Big Data collection,processing,analysis,and storage remain,especially the low efficiency of medical diagnosis.And with the wide application of the Internet of Things and Big Data in the medical field,medical Big Data is increasing in geometric magnitude resulting in cloud service overload,insufficient storage,communication delay,and network congestion.In order to solve these medical and network problems,a medical big-data-oriented fog computing architec-ture and BP algorithm application are proposed,and its structural advantages and characteristics are studied.This architecture enables the medical Big Data generated by medical edge devices and the existing data in the cloud service center to calculate,compare and analyze the fog node through the Internet of Things.The diagnosis results are designed to reduce the business processing delay and improve the diagnosis effect.Considering the weak computing of each edge device,the artificial intelligence BP neural network algorithm is used in the core computing model of the medical diagnosis system to improve the system computing power,enhance the medical intelligence-aided decision-making,and improve the clinical diagnosis and treatment efficiency.In the application process,combined with the characteristics of medical Big Data technology,through fog architecture design and Big Data technology integration,we could research the processing and analysis of heterogeneous data of the medical diagnosis system in the context of the Internet of Things.The results are promising:The medical platform network is smooth,the data storage space is sufficient,the data processing and analysis speed is fast,the diagnosis effect is remarkable,and it is a good assistant to doctors’treatment effect.It not only effectively solves the problem of low clinical diagnosis,treatment efficiency and quality,but also reduces the waiting time of patients,effectively solves the contradiction between doctors and patients,and improves the medical service quality and management level.展开更多
Estimating the volume growth of forest ecosystems accurately is important for understanding carbon sequestration and achieving carbon neutrality goals.However,the key environmental factors affecting volume growth diff...Estimating the volume growth of forest ecosystems accurately is important for understanding carbon sequestration and achieving carbon neutrality goals.However,the key environmental factors affecting volume growth differ across various scales and plant functional types.This study was,therefore,conducted to estimate the volume growth of Larix and Quercus forests based on national-scale forestry inventory data in China and its influencing factors using random forest algorithms.The results showed that the model performances of volume growth in natural forests(R^(2)=0.65 for Larix and 0.66 for Quercus,respectively)were better than those in planted forests(R^(2)=0.44 for Larix and 0.40 for Quercus,respectively).In both natural and planted forests,the stand age showed a strong relative importance for volume growth(8.6%–66.2%),while the edaphic and climatic variables had a limited relative importance(<6.0%).The relationship between stand age and volume growth was unimodal in natural forests and linear increase in planted Quercus forests.And the specific locations(i.e.,altitude and aspect)of sampling plots exhibited high relative importance for volume growth in planted forests(4.1%–18.2%).Altitude positively affected volume growth in planted Larix forests but controlled volume growth negatively in planted Quercus forests.Similarly,the effects of other environmental factors on volume growth also differed in both stand origins(planted versus natural)and plant functional types(Larix versus Quercus).These results highlighted that the stand age was the most important predictor for volume growth and there were diverse effects of environmental factors on volume growth among stand origins and plant functional types.Our findings will provide a good framework for site-specific recommendations regarding the management practices necessary to maintain the volume growth in China's forest ecosystems.展开更多
Emotion represents the feeling of an individual in a given situation. There are various ways to express the emotions of an individual. It can be categorized into verbal expressions, written expressions, facial express...Emotion represents the feeling of an individual in a given situation. There are various ways to express the emotions of an individual. It can be categorized into verbal expressions, written expressions, facial expressions and gestures. Among these various ways of expressing the emotion, the written method is a challenging task to extract the emotions, as the data is in the form of textual dat. Finding the different kinds of emotions is also a tedious task as it requires a lot of pre preparations of the textual data taken for the research. This research work is carried out to analyse and extract the emotions hidden in text data. The text data taken for the analysis is from the social media dataset. Using the raw text data directly from the social media will not serve the purpose. Therefore, the text data has to be pre-processed and then utilised for further processing. Pre-processing makes the text data more efficient and would infer valuable insights of the emotions hidden in it. The preprocessing steps also help to manage the text data for identifying the emotions conveyed in the text. This work proposes to deduct the emotions taken from the social media text data by applying the machine learning algorithm. Finally, the usefulness of the emotions is suggested for various stake holders, to find the attitude of individuals at that moment, the data is produced. .展开更多
Many business applications rely on their historical data to predict their business future. The marketing products process is one of the core processes for the business. Customer needs give a useful piece of informatio...Many business applications rely on their historical data to predict their business future. The marketing products process is one of the core processes for the business. Customer needs give a useful piece of information that help</span><span style="font-family:Verdana;"><span style="font-family:Verdana;">s</span></span><span style="font-family:Verdana;"> to market the appropriate products at the appropriate time. Moreover, services are considered recently as products. The development of education and health services </span><span style="font-family:Verdana;"><span style="font-family:Verdana;">is</span></span><span style="font-family:Verdana;"> depending on historical data. For the more, reducing online social media networks problems and crimes need a significant source of information. Data analysts need to use an efficient classification algorithm to predict the future of such businesses. However, dealing with a huge quantity of data requires great time to process. Data mining involves many useful techniques that are used to predict statistical data in a variety of business applications. The classification technique is one of the most widely used with a variety of algorithms. In this paper, various classification algorithms are revised in terms of accuracy in different areas of data mining applications. A comprehensive analysis is made after delegated reading of 20 papers in the literature. This paper aims to help data analysts to choose the most suitable classification algorithm for different business applications including business in general, online social media networks, agriculture, health, and education. Results show FFBPN is the most accurate algorithm in the business domain. The Random Forest algorithm is the most accurate in classifying online social networks (OSN) activities. Na<span style="white-space:nowrap;">ï</span>ve Bayes algorithm is the most accurate to classify agriculture datasets. OneR is the most accurate algorithm to classify instances within the health domain. The C4.5 Decision Tree algorithm is the most accurate to classify students’ records to predict degree completion time.展开更多
The composition of base oils affects the performance of lubricants made from them.This paper proposes a hybrid model based on gradient-boosted decision tree(GBDT)to analyze the effect of different ratios of KN4010,PAO...The composition of base oils affects the performance of lubricants made from them.This paper proposes a hybrid model based on gradient-boosted decision tree(GBDT)to analyze the effect of different ratios of KN4010,PAO40,and PriEco3000 component in a composite base oil system on the performance of lubricants.The study was conducted under small laboratory sample conditions,and a data expansion method using the Gaussian Copula function was proposed to improve the prediction ability of the hybrid model.The study also compared four optimization algorithms,sticky mushroom algorithm(SMA),genetic algorithm(GA),whale optimization algorithm(WOA),and seagull optimization algorithm(SOA),to predict the kinematic viscosity at 40℃,kinematic viscosity at 100℃,viscosity index,and oxidation induction time performance of the lubricant.The results showed that the Gaussian Copula function data expansion method improved the prediction ability of the hybrid model in the case of small samples.The SOA-GBDT hybrid model had the fastest convergence speed for the samples and the best prediction effect,with determination coefficients(R^(2))for the four indicators of lubricants reaching 0.98,0.99,0.96 and 0.96,respectively.Thus,this model can significantly reduce the model’s prediction error and has good prediction ability.展开更多
Bayesian networks are a powerful class of graphical decision models used to represent causal relationships among variables.However,the reliability and integrity of learned Bayesian network models are highly dependent ...Bayesian networks are a powerful class of graphical decision models used to represent causal relationships among variables.However,the reliability and integrity of learned Bayesian network models are highly dependent on the quality of incoming data streams.One of the primary challenges with Bayesian networks is their vulnerability to adversarial data poisoning attacks,wherein malicious data is injected into the training dataset to negatively influence the Bayesian network models and impair their performance.In this research paper,we propose an efficient framework for detecting data poisoning attacks against Bayesian network structure learning algorithms.Our framework utilizes latent variables to quantify the amount of belief between every two nodes in each causal model over time.We use our innovative methodology to tackle an important issue with data poisoning assaults in the context of Bayesian networks.With regard to four different forms of data poisoning attacks,we specifically aim to strengthen the security and dependability of Bayesian network structure learning techniques,such as the PC algorithm.By doing this,we explore the complexity of this area and offer workablemethods for identifying and reducing these sneaky dangers.Additionally,our research investigates one particular use case,the“Visit to Asia Network.”The practical consequences of using uncertainty as a way to spot cases of data poisoning are explored in this inquiry,which is of utmost relevance.Our results demonstrate the promising efficacy of latent variables in detecting and mitigating the threat of data poisoning attacks.Additionally,our proposed latent-based framework proves to be sensitive in detecting malicious data poisoning attacks in the context of stream data.展开更多
With continuous expansion of satellite applications,the requirements for satellite communication services,such as communication delay,transmission bandwidth,transmission power consumption,and communication coverage,ar...With continuous expansion of satellite applications,the requirements for satellite communication services,such as communication delay,transmission bandwidth,transmission power consumption,and communication coverage,are becoming higher.This paper first presents an overview of the current development status of Low Earth Orbit(LEO)satellite constellations,and then conducts a demand analysis for multi-satellite data transmission based on LEO satellite constellations.The problem is described,and the challenges and difficulties of the problem are analyzed accordingly.On this basis,a multi-satellite data-transmission mathematical model is then constructed.Combining classical heuristic allocating strategies on the features of the proposed model,with the reinforcement learning algorithm Deep Q-Network(DQN),a two-stage optimization framework based on heuristic and DQN is proposed.Finally,by taking into account the spatial and temporal distribution characteristics of satellite and facility resources,a multi-satellite scheduling instance dataset is generated.Experimental results validate the rationality and correctness of the DQN algorithm in solving the collaborative scheduling problem of multi-satellite data transmission.展开更多
Ultra-high voltage(UHV)transmission lines are an important part of China’s power grid and are often surrounded by a complex electromagnetic environment.The ground total electric field is considered a main electromagn...Ultra-high voltage(UHV)transmission lines are an important part of China’s power grid and are often surrounded by a complex electromagnetic environment.The ground total electric field is considered a main electromagnetic environment indicator of UHV transmission lines and is currently employed for reliable long-term operation of the power grid.Yet,the accurate prediction of the ground total electric field remains a technical challenge.In this work,we collected the total electric field data from the Ningdong-Zhejiang±800 kV UHVDC transmission project,as of the Ling Shao line,and perform an outlier analysis of the total electric field data.We show that the Local Outlier Factor(LOF)elimination algorithm has a small average difference and overcomes the performance of Density-Based Spatial Clustering of Applications with Noise(DBSCAN)and Isolated Forest elimination algorithms.Moreover,the Stacking algorithm has been found to have superior prediction accuracy than a variety of similar prediction algorithms,including the traditional finite element.The low prediction error of the Stacking algorithm highlights the superior ability to accurately forecast the ground total electric field of UHVDC transmission lines.展开更多
The world produces vast quantities of high-dimensional multi-semantic data.However,extracting valuable information from such a large amount of high-dimensional and multi-label data is undoubtedly arduous and challengi...The world produces vast quantities of high-dimensional multi-semantic data.However,extracting valuable information from such a large amount of high-dimensional and multi-label data is undoubtedly arduous and challenging.Feature selection aims to mitigate the adverse impacts of high dimensionality in multi-label data by eliminating redundant and irrelevant features.The ant colony optimization algorithm has demonstrated encouraging outcomes in multi-label feature selection,because of its simplicity,efficiency,and similarity to reinforcement learning.Nevertheless,existing methods do not consider crucial correlation information,such as dynamic redundancy and label correlation.To tackle these concerns,the paper proposes a multi-label feature selection technique based on ant colony optimization algorithm(MFACO),focusing on dynamic redundancy and label correlation.Initially,the dynamic redundancy is assessed between the selected feature subset and potential features.Meanwhile,the ant colony optimization algorithm extracts label correlation from the label set,which is then combined into the heuristic factor as label weights.Experimental results demonstrate that our proposed strategies can effectively enhance the optimal search ability of ant colony,outperforming the other algorithms involved in the paper.展开更多
基金supported by the National Natural Science Foundation of China(Project No.51767018)Natural Science Foundation of Gansu Province(Project No.23JRRA836).
文摘Current methodologies for cleaning wind power anomaly data exhibit limited capabilities in identifying abnormal data within extensive datasets and struggle to accommodate the considerable variability and intricacy of wind farm data.Consequently,a method for cleaning wind power anomaly data by combining image processing with community detection algorithms(CWPAD-IPCDA)is proposed.To precisely identify and initially clean anomalous data,wind power curve(WPC)images are converted into graph structures,which employ the Louvain community recognition algorithm and graph-theoretic methods for community detection and segmentation.Furthermore,the mathematical morphology operation(MMO)determines the main part of the initially cleaned wind power curve images and maps them back to the normal wind power points to complete the final cleaning.The CWPAD-IPCDA method was applied to clean datasets from 25 wind turbines(WTs)in two wind farms in northwest China to validate its feasibility.A comparison was conducted using density-based spatial clustering of applications with noise(DBSCAN)algorithm,an improved isolation forest algorithm,and an image-based(IB)algorithm.The experimental results demonstrate that the CWPAD-IPCDA method surpasses the other three algorithms,achieving an approximately 7.23%higher average data cleaning rate.The mean value of the sum of the squared errors(SSE)of the dataset after cleaning is approximately 6.887 lower than that of the other algorithms.Moreover,the mean of overall accuracy,as measured by the F1-score,exceeds that of the other methods by approximately 10.49%;this indicates that the CWPAD-IPCDA method is more conducive to improving the accuracy and reliability of wind power curve modeling and wind farm power forecasting.
基金Foundation of National Natural Science Foundation of China(62202118)Scientific and Technological Research Projects from Guizhou Education Department([2023]003)+1 种基金Guizhou Provincial Department of Science and Technology Hundred Levels of Innovative Talents Project(GCC[2023]018)Top Technology Talent Project from Guizhou Education Department([2022]073).
文摘The development of technologies such as big data and blockchain has brought convenience to life,but at the same time,privacy and security issues are becoming more and more prominent.The K-anonymity algorithm is an effective and low computational complexity privacy-preserving algorithm that can safeguard users’privacy by anonymizing big data.However,the algorithm currently suffers from the problem of focusing only on improving user privacy while ignoring data availability.In addition,ignoring the impact of quasi-identified attributes on sensitive attributes causes the usability of the processed data on statistical analysis to be reduced.Based on this,we propose a new K-anonymity algorithm to solve the privacy security problem in the context of big data,while guaranteeing improved data usability.Specifically,we construct a new information loss function based on the information quantity theory.Considering that different quasi-identification attributes have different impacts on sensitive attributes,we set weights for each quasi-identification attribute when designing the information loss function.In addition,to reduce information loss,we improve K-anonymity in two ways.First,we make the loss of information smaller than in the original table while guaranteeing privacy based on common artificial intelligence algorithms,i.e.,greedy algorithm and 2-means clustering algorithm.In addition,we improve the 2-means clustering algorithm by designing a mean-center method to select the initial center of mass.Meanwhile,we design the K-anonymity algorithm of this scheme based on the constructed information loss function,the improved 2-means clustering algorithm,and the greedy algorithm,which reduces the information loss.Finally,we experimentally demonstrate the effectiveness of the algorithm in improving the effect of 2-means clustering and reducing information loss.
文摘Compositional data, such as relative information, is a crucial aspect of machine learning and other related fields. It is typically recorded as closed data or sums to a constant, like 100%. The statistical linear model is the most used technique for identifying hidden relationships between underlying random variables of interest. However, data quality is a significant challenge in machine learning, especially when missing data is present. The linear regression model is a commonly used statistical modeling technique used in various applications to find relationships between variables of interest. When estimating linear regression parameters which are useful for things like future prediction and partial effects analysis of independent variables, maximum likelihood estimation (MLE) is the method of choice. However, many datasets contain missing observations, which can lead to costly and time-consuming data recovery. To address this issue, the expectation-maximization (EM) algorithm has been suggested as a solution for situations including missing data. The EM algorithm repeatedly finds the best estimates of parameters in statistical models that depend on variables or data that have not been observed. This is called maximum likelihood or maximum a posteriori (MAP). Using the present estimate as input, the expectation (E) step constructs a log-likelihood function. Finding the parameters that maximize the anticipated log-likelihood, as determined in the E step, is the job of the maximization (M) phase. This study looked at how well the EM algorithm worked on a made-up compositional dataset with missing observations. It used both the robust least square version and ordinary least square regression techniques. The efficacy of the EM algorithm was compared with two alternative imputation techniques, k-Nearest Neighbor (k-NN) and mean imputation (), in terms of Aitchison distances and covariance.
文摘Many search-based algorithms have been successfully applied in sev-eral software engineering activities.Genetic algorithms(GAs)are the most used in the scientific domains by scholars to solve software testing problems.They imi-tate the theory of natural selection and evolution.The harmony search algorithm(HSA)is one of the most recent search algorithms in the last years.It imitates the behavior of a musician tofind the best harmony.Scholars have estimated the simi-larities and the differences between genetic algorithms and the harmony search algorithm in diverse research domains.The test data generation process represents a critical task in software validation.Unfortunately,there is no work comparing the performance of genetic algorithms and the harmony search algorithm in the test data generation process.This paper studies the similarities and the differences between genetic algorithms and the harmony search algorithm based on the ability and speed offinding the required test data.The current research performs an empirical comparison of the HSA and the GAs,and then the significance of the results is estimated using the t-Test.The study investigates the efficiency of the harmony search algorithm and the genetic algorithms according to(1)the time performance,(2)the significance of the generated test data,and(3)the adequacy of the generated test data to satisfy a given testing criterion.The results showed that the harmony search algorithm is significantly faster than the genetic algo-rithms because the t-Test showed that the p-value of the time values is 0.026<α(αis the significance level=0.05 at 95%confidence level).In contrast,there is no significant difference between the two algorithms in generating the adequate test data because the t-Test showed that the p-value of thefitness values is 0.25>α.
文摘With the rapid development of the Internet globally since the 21st century,the amount of data information has increased exponentially.Data helps improve people’s livelihood and working conditions,as well as learning efficiency.Therefore,data extraction,analysis,and processing have become a hot issue for people from all walks of life.Traditional recommendation algorithm still has some problems,such as inaccuracy,less diversity,and low performance.To solve these problems and improve the accuracy and variety of the recommendation algorithms,the research combines the convolutional neural networks(CNN)and the attention model to design a recommendation algorithm based on the neural network framework.Through the text convolutional network,the input layer in CNN has transformed into two channels:static ones and non-static ones.Meanwhile,the self-attention system focuses on the system so that data can be better processed and the accuracy of feature extraction becomes higher.The recommendation algorithm combines CNN and attention system and divides the embedding layer into user information feature embedding and data name feature extraction embedding.It obtains data name features through a convolution kernel.Finally,the top pooling layer obtains the length vector.The attention system layer obtains the characteristics of the data type.Experimental results show that the proposed recommendation algorithm that combines CNN and the attention system can perform better in data extraction than the traditional CNN algorithm and other recommendation algorithms that are popular at the present stage.The proposed algorithm shows excellent accuracy and robustness.
基金The National Natural Science Foundation of China under contract Nos 41330960 and 41276193 and 41206184
文摘In recent years, the rapid decline of Arctic sea ice area (SIA) and sea ice extent (SIE), especially for the multiyear (MY) ice, has led to significant effect on climate change. The accurate retrieval of MY ice concentration retrieval is very important and challenging to understand the ongoing changes. Three MY ice concentration retrieval algorithms were systematically evaluated. A similar total ice concentration was yielded by these algorithms, while the retrieved MY sea ice concentrations differs from each other. The MY SIA derived from NASA TEAM algorithm is relatively stable. Other two algorithms created seasonal fluctuations of MY SIA, particularly in autumn and winter. In this paper, we proposed an ice concentration retrieval algorithm, which developed the NASA TEAM algorithm by adding to use AMSR-E 6.9 GHz brightness temperature data and sea ice concentration using 89.0 GHz data. Comparison with the reference MY SIA from reference MY ice, indicates that the mean difference and root mean square (rms) difference of MY SIA derived from the algorithm of this study are 0.65×10^6 km^2 and 0.69×10^6 km^2 during January to March, -0.06×10^6 km^2 and 0.14×10^6 km^2during September to December respectively. Comparison with MY SIE obtained from weekly ice age data provided by University of Colorado show that, the mean difference and rms difference are 0.69×10^6 km^2 and 0.84×10^6 km^2, respectively. The developed algorithm proposed in this study has smaller difference compared with the reference MY ice and MY SIE from ice age data than the Wang's, Lomax' and NASA TEAM algorithms.
基金supported by the Yunnan Major Scientific and Technological Projects(Grant No.202302AD080001)the National Natural Science Foundation,China(No.52065033).
文摘When building a classification model,the scenario where the samples of one class are significantly more than those of the other class is called data imbalance.Data imbalance causes the trained classification model to be in favor of the majority class(usually defined as the negative class),which may do harm to the accuracy of the minority class(usually defined as the positive class),and then lead to poor overall performance of the model.A method called MSHR-FCSSVM for solving imbalanced data classification is proposed in this article,which is based on a new hybrid resampling approach(MSHR)and a new fine cost-sensitive support vector machine(CS-SVM)classifier(FCSSVM).The MSHR measures the separability of each negative sample through its Silhouette value calculated by Mahalanobis distance between samples,based on which,the so-called pseudo-negative samples are screened out to generate new positive samples(over-sampling step)through linear interpolation and are deleted finally(under-sampling step).This approach replaces pseudo-negative samples with generated new positive samples one by one to clear up the inter-class overlap on the borderline,without changing the overall scale of the dataset.The FCSSVM is an improved version of the traditional CS-SVM.It considers influences of both the imbalance of sample number and the class distribution on classification simultaneously,and through finely tuning the class cost weights by using the efficient optimization algorithm based on the physical phenomenon of rime-ice(RIME)algorithm with cross-validation accuracy as the fitness function to accurately adjust the classification borderline.To verify the effectiveness of the proposed method,a series of experiments are carried out based on 20 imbalanced datasets including both mildly and extremely imbalanced datasets.The experimental results show that the MSHR-FCSSVM method performs better than the methods for comparison in most cases,and both the MSHR and the FCSSVM played significant roles.
基金supported by the National Science and Technology Innovation 2030 Next-Generation Artifical Intelligence Major Project(2018AAA0101801)the National Natural Science Foundation of China(72271188)。
文摘With the development of information technology,a large number of product quality data in the entire manufacturing process is accumulated,but it is not explored and used effectively.The traditional product quality prediction models have many disadvantages,such as high complexity and low accuracy.To overcome the above problems,we propose an optimized data equalization method to pre-process dataset and design a simple but effective product quality prediction model:radial basis function model optimized by the firefly algorithm with Levy flight mechanism(RBFFALM).First,the new data equalization method is introduced to pre-process the dataset,which reduces the dimension of the data,removes redundant features,and improves the data distribution.Then the RBFFALFM is used to predict product quality.Comprehensive expe riments conducted on real-world product quality datasets validate that the new model RBFFALFM combining with the new data pre-processing method outperforms other previous me thods on predicting product quality.
基金The author would like to thank the Deanship of Scientific Research at Majmaah University for supporting this work under Project Number No.R-2022-85.
文摘The paper addresses the challenge of transmitting a big number offiles stored in a data center(DC),encrypting them by compilers,and sending them through a network at an acceptable time.Face to the big number offiles,only one compiler may not be sufficient to encrypt data in an acceptable time.In this paper,we consider the problem of several compilers and the objective is tofind an algorithm that can give an efficient schedule for the givenfiles to be compiled by the compilers.The main objective of the work is to minimize the gap in the total size of assignedfiles between compilers.This minimization ensures the fair distribution offiles to different compilers.This problem is considered to be a very hard problem.This paper presents two research axes.Thefirst axis is related to architecture.We propose a novel pre-compiler architecture in this context.The second axis is algorithmic development.We develop six algorithms to solve the problem,in this context.These algorithms are based on the dispatching rules method,decomposition method,and an iterative approach.These algorithms give approximate solutions for the studied problem.An experimental result is imple-mented to show the performance of algorithms.Several indicators are used to measure the performance of the proposed algorithms.In addition,five classes are proposed to test the algorithms with a total of 2350 instances.A comparison between the proposed algorithms is presented in different tables discussed to show the performance of each algorithm.The result showed that the best algorithm is the Iterative-mixed Smallest-Longest-Heuristic(ISL)with a percentage equal to 97.7%and an average running time equal to 0.148 s.All other algorithms did not exceed 22%as a percentage.The best algorithm excluding ISL is Iterative-mixed Longest-Smallest Heuristic(ILS)with a percentage equal to 21,4%and an average running time equal to 0.150 s.
基金Deputyship for Research&Innovation,Ministry of Education in Saudi Arabia for funding this research work through the Project Number RI-44-0444.
文摘Datamining plays a crucial role in extractingmeaningful knowledge fromlarge-scale data repositories,such as data warehouses and databases.Association rule mining,a fundamental process in data mining,involves discovering correlations,patterns,and causal structures within datasets.In the healthcare domain,association rules offer valuable opportunities for building knowledge bases,enabling intelligent diagnoses,and extracting invaluable information rapidly.This paper presents a novel approach called the Machine Learning based Association Rule Mining and Classification for Healthcare Data Management System(MLARMC-HDMS).The MLARMC-HDMS technique integrates classification and association rule mining(ARM)processes.Initially,the chimp optimization algorithm-based feature selection(COAFS)technique is employed within MLARMC-HDMS to select relevant attributes.Inspired by the foraging behavior of chimpanzees,the COA algorithm mimics their search strategy for food.Subsequently,the classification process utilizes stochastic gradient descent with a multilayer perceptron(SGD-MLP)model,while the Apriori algorithm determines attribute relationships.We propose a COA-based feature selection approach for medical data classification using machine learning techniques.This approach involves selecting pertinent features from medical datasets through COA and training machine learning models using the reduced feature set.We evaluate the performance of our approach on various medical datasets employing diverse machine learning classifiers.Experimental results demonstrate that our proposed approach surpasses alternative feature selection methods,achieving higher accuracy and precision rates in medical data classification tasks.The study showcases the effectiveness and efficiency of the COA-based feature selection approach in identifying relevant features,thereby enhancing the diagnosis and treatment of various diseases.To provide further validation,we conduct detailed experiments on a benchmark medical dataset,revealing the superiority of the MLARMCHDMS model over other methods,with a maximum accuracy of 99.75%.Therefore,this research contributes to the advancement of feature selection techniques in medical data classification and highlights the potential for improving healthcare outcomes through accurate and efficient data analysis.The presented MLARMC-HDMS framework and COA-based feature selection approach offer valuable insights for researchers and practitioners working in the field of healthcare data mining and machine learning.
基金Sponsored by the National High Technology Research and Development Program of China(2006AA701306)the National Innovation Foundation of Enterprises(05C26212200378)
文摘Improved traditional ant colony algorithms,a data routing model used to the data remote exchange on WAN was presented.In the model,random heuristic factors were introduced to realize multi-path search.The updating model of pheromone could adjust the pheromone concentration on the optimal path according to path load dynamically to make the system keep load balance.The simulation results show that the improved model has a higher performance on convergence and load balance.
基金supported by 2020 Foshan Science and Technology Project(Numbering:2020001005356),Baoling Qin received the grant.
文摘Although the Internet of Things has been widely applied,the problems of cloud computing in the application of digital smart medical Big Data collection,processing,analysis,and storage remain,especially the low efficiency of medical diagnosis.And with the wide application of the Internet of Things and Big Data in the medical field,medical Big Data is increasing in geometric magnitude resulting in cloud service overload,insufficient storage,communication delay,and network congestion.In order to solve these medical and network problems,a medical big-data-oriented fog computing architec-ture and BP algorithm application are proposed,and its structural advantages and characteristics are studied.This architecture enables the medical Big Data generated by medical edge devices and the existing data in the cloud service center to calculate,compare and analyze the fog node through the Internet of Things.The diagnosis results are designed to reduce the business processing delay and improve the diagnosis effect.Considering the weak computing of each edge device,the artificial intelligence BP neural network algorithm is used in the core computing model of the medical diagnosis system to improve the system computing power,enhance the medical intelligence-aided decision-making,and improve the clinical diagnosis and treatment efficiency.In the application process,combined with the characteristics of medical Big Data technology,through fog architecture design and Big Data technology integration,we could research the processing and analysis of heterogeneous data of the medical diagnosis system in the context of the Internet of Things.The results are promising:The medical platform network is smooth,the data storage space is sufficient,the data processing and analysis speed is fast,the diagnosis effect is remarkable,and it is a good assistant to doctors’treatment effect.It not only effectively solves the problem of low clinical diagnosis,treatment efficiency and quality,but also reduces the waiting time of patients,effectively solves the contradiction between doctors and patients,and improves the medical service quality and management level.
基金supported by the Major Program of the National Natural Science Foundation of China(No.32192434)the Fundamental Research Funds of Chinese Academy of Forestry(No.CAFYBB2019ZD001)the National Key Research and Development Program of China(2016YFD060020602).
文摘Estimating the volume growth of forest ecosystems accurately is important for understanding carbon sequestration and achieving carbon neutrality goals.However,the key environmental factors affecting volume growth differ across various scales and plant functional types.This study was,therefore,conducted to estimate the volume growth of Larix and Quercus forests based on national-scale forestry inventory data in China and its influencing factors using random forest algorithms.The results showed that the model performances of volume growth in natural forests(R^(2)=0.65 for Larix and 0.66 for Quercus,respectively)were better than those in planted forests(R^(2)=0.44 for Larix and 0.40 for Quercus,respectively).In both natural and planted forests,the stand age showed a strong relative importance for volume growth(8.6%–66.2%),while the edaphic and climatic variables had a limited relative importance(<6.0%).The relationship between stand age and volume growth was unimodal in natural forests and linear increase in planted Quercus forests.And the specific locations(i.e.,altitude and aspect)of sampling plots exhibited high relative importance for volume growth in planted forests(4.1%–18.2%).Altitude positively affected volume growth in planted Larix forests but controlled volume growth negatively in planted Quercus forests.Similarly,the effects of other environmental factors on volume growth also differed in both stand origins(planted versus natural)and plant functional types(Larix versus Quercus).These results highlighted that the stand age was the most important predictor for volume growth and there were diverse effects of environmental factors on volume growth among stand origins and plant functional types.Our findings will provide a good framework for site-specific recommendations regarding the management practices necessary to maintain the volume growth in China's forest ecosystems.
文摘Emotion represents the feeling of an individual in a given situation. There are various ways to express the emotions of an individual. It can be categorized into verbal expressions, written expressions, facial expressions and gestures. Among these various ways of expressing the emotion, the written method is a challenging task to extract the emotions, as the data is in the form of textual dat. Finding the different kinds of emotions is also a tedious task as it requires a lot of pre preparations of the textual data taken for the research. This research work is carried out to analyse and extract the emotions hidden in text data. The text data taken for the analysis is from the social media dataset. Using the raw text data directly from the social media will not serve the purpose. Therefore, the text data has to be pre-processed and then utilised for further processing. Pre-processing makes the text data more efficient and would infer valuable insights of the emotions hidden in it. The preprocessing steps also help to manage the text data for identifying the emotions conveyed in the text. This work proposes to deduct the emotions taken from the social media text data by applying the machine learning algorithm. Finally, the usefulness of the emotions is suggested for various stake holders, to find the attitude of individuals at that moment, the data is produced. .
文摘Many business applications rely on their historical data to predict their business future. The marketing products process is one of the core processes for the business. Customer needs give a useful piece of information that help</span><span style="font-family:Verdana;"><span style="font-family:Verdana;">s</span></span><span style="font-family:Verdana;"> to market the appropriate products at the appropriate time. Moreover, services are considered recently as products. The development of education and health services </span><span style="font-family:Verdana;"><span style="font-family:Verdana;">is</span></span><span style="font-family:Verdana;"> depending on historical data. For the more, reducing online social media networks problems and crimes need a significant source of information. Data analysts need to use an efficient classification algorithm to predict the future of such businesses. However, dealing with a huge quantity of data requires great time to process. Data mining involves many useful techniques that are used to predict statistical data in a variety of business applications. The classification technique is one of the most widely used with a variety of algorithms. In this paper, various classification algorithms are revised in terms of accuracy in different areas of data mining applications. A comprehensive analysis is made after delegated reading of 20 papers in the literature. This paper aims to help data analysts to choose the most suitable classification algorithm for different business applications including business in general, online social media networks, agriculture, health, and education. Results show FFBPN is the most accurate algorithm in the business domain. The Random Forest algorithm is the most accurate in classifying online social networks (OSN) activities. Na<span style="white-space:nowrap;">ï</span>ve Bayes algorithm is the most accurate to classify agriculture datasets. OneR is the most accurate algorithm to classify instances within the health domain. The C4.5 Decision Tree algorithm is the most accurate to classify students’ records to predict degree completion time.
基金financial support extended for this academic work by the Beijing Natural Science Foundation(Grant 2232066)the Open Project Foundation of State Key Laboratory of Solid Lubrication(Grant LSL-2212).
文摘The composition of base oils affects the performance of lubricants made from them.This paper proposes a hybrid model based on gradient-boosted decision tree(GBDT)to analyze the effect of different ratios of KN4010,PAO40,and PriEco3000 component in a composite base oil system on the performance of lubricants.The study was conducted under small laboratory sample conditions,and a data expansion method using the Gaussian Copula function was proposed to improve the prediction ability of the hybrid model.The study also compared four optimization algorithms,sticky mushroom algorithm(SMA),genetic algorithm(GA),whale optimization algorithm(WOA),and seagull optimization algorithm(SOA),to predict the kinematic viscosity at 40℃,kinematic viscosity at 100℃,viscosity index,and oxidation induction time performance of the lubricant.The results showed that the Gaussian Copula function data expansion method improved the prediction ability of the hybrid model in the case of small samples.The SOA-GBDT hybrid model had the fastest convergence speed for the samples and the best prediction effect,with determination coefficients(R^(2))for the four indicators of lubricants reaching 0.98,0.99,0.96 and 0.96,respectively.Thus,this model can significantly reduce the model’s prediction error and has good prediction ability.
文摘Bayesian networks are a powerful class of graphical decision models used to represent causal relationships among variables.However,the reliability and integrity of learned Bayesian network models are highly dependent on the quality of incoming data streams.One of the primary challenges with Bayesian networks is their vulnerability to adversarial data poisoning attacks,wherein malicious data is injected into the training dataset to negatively influence the Bayesian network models and impair their performance.In this research paper,we propose an efficient framework for detecting data poisoning attacks against Bayesian network structure learning algorithms.Our framework utilizes latent variables to quantify the amount of belief between every two nodes in each causal model over time.We use our innovative methodology to tackle an important issue with data poisoning assaults in the context of Bayesian networks.With regard to four different forms of data poisoning attacks,we specifically aim to strengthen the security and dependability of Bayesian network structure learning techniques,such as the PC algorithm.By doing this,we explore the complexity of this area and offer workablemethods for identifying and reducing these sneaky dangers.Additionally,our research investigates one particular use case,the“Visit to Asia Network.”The practical consequences of using uncertainty as a way to spot cases of data poisoning are explored in this inquiry,which is of utmost relevance.Our results demonstrate the promising efficacy of latent variables in detecting and mitigating the threat of data poisoning attacks.Additionally,our proposed latent-based framework proves to be sensitive in detecting malicious data poisoning attacks in the context of stream data.
基金supported by the National Natural Science Foundation of China(Nos.42271391,62006214,and 42101439)the Joint Funds of Equipment Pre-Research and Ministry of Education of China(No.8091B022148)+1 种基金the 14th Five-Year Pre-Research Project of Civil Aerospace of China,the Hubei Excellent Young and Middle-Aged Science and Technology Innovation Team Plan Project(No.T2021031)the Open Research Project of Hubei Key Laboratory of Intelligent GeoInformation Processing(No.KLIGIP-2022-B09).
文摘With continuous expansion of satellite applications,the requirements for satellite communication services,such as communication delay,transmission bandwidth,transmission power consumption,and communication coverage,are becoming higher.This paper first presents an overview of the current development status of Low Earth Orbit(LEO)satellite constellations,and then conducts a demand analysis for multi-satellite data transmission based on LEO satellite constellations.The problem is described,and the challenges and difficulties of the problem are analyzed accordingly.On this basis,a multi-satellite data-transmission mathematical model is then constructed.Combining classical heuristic allocating strategies on the features of the proposed model,with the reinforcement learning algorithm Deep Q-Network(DQN),a two-stage optimization framework based on heuristic and DQN is proposed.Finally,by taking into account the spatial and temporal distribution characteristics of satellite and facility resources,a multi-satellite scheduling instance dataset is generated.Experimental results validate the rationality and correctness of the DQN algorithm in solving the collaborative scheduling problem of multi-satellite data transmission.
基金funded by a science and technology project of State Grid Corporation of China“Comparative Analysis of Long-Term Measurement and Prediction of the Ground Synthetic Electric Field of±800 kV DC Transmission Line”(GYW11201907738)Paulo R.F.Rocha acknowledges the support and funding from the European Research Council(ERC)under the European Union’s Horizon 2020 Research and Innovation Program(Grant Agreement No.947897).
文摘Ultra-high voltage(UHV)transmission lines are an important part of China’s power grid and are often surrounded by a complex electromagnetic environment.The ground total electric field is considered a main electromagnetic environment indicator of UHV transmission lines and is currently employed for reliable long-term operation of the power grid.Yet,the accurate prediction of the ground total electric field remains a technical challenge.In this work,we collected the total electric field data from the Ningdong-Zhejiang±800 kV UHVDC transmission project,as of the Ling Shao line,and perform an outlier analysis of the total electric field data.We show that the Local Outlier Factor(LOF)elimination algorithm has a small average difference and overcomes the performance of Density-Based Spatial Clustering of Applications with Noise(DBSCAN)and Isolated Forest elimination algorithms.Moreover,the Stacking algorithm has been found to have superior prediction accuracy than a variety of similar prediction algorithms,including the traditional finite element.The low prediction error of the Stacking algorithm highlights the superior ability to accurately forecast the ground total electric field of UHVDC transmission lines.
基金supported by National Natural Science Foundation of China(Grant Nos.62376089,62302153,62302154,62202147)the key Research and Development Program of Hubei Province,China(Grant No.2023BEB024).
文摘The world produces vast quantities of high-dimensional multi-semantic data.However,extracting valuable information from such a large amount of high-dimensional and multi-label data is undoubtedly arduous and challenging.Feature selection aims to mitigate the adverse impacts of high dimensionality in multi-label data by eliminating redundant and irrelevant features.The ant colony optimization algorithm has demonstrated encouraging outcomes in multi-label feature selection,because of its simplicity,efficiency,and similarity to reinforcement learning.Nevertheless,existing methods do not consider crucial correlation information,such as dynamic redundancy and label correlation.To tackle these concerns,the paper proposes a multi-label feature selection technique based on ant colony optimization algorithm(MFACO),focusing on dynamic redundancy and label correlation.Initially,the dynamic redundancy is assessed between the selected feature subset and potential features.Meanwhile,the ant colony optimization algorithm extracts label correlation from the label set,which is then combined into the heuristic factor as label weights.Experimental results demonstrate that our proposed strategies can effectively enhance the optimal search ability of ant colony,outperforming the other algorithms involved in the paper.