期刊文献+
共找到14,808篇文章
< 1 2 250 >
每页显示 20 50 100
A method for cleaning wind power anomaly data by combining image processing with community detection algorithms
1
作者 Qiaoling Yang Kai Chen +2 位作者 Jianzhang Man Jiaheng Duan Zuoqi Jin 《Global Energy Interconnection》 EI CSCD 2024年第3期293-312,共20页
Current methodologies for cleaning wind power anomaly data exhibit limited capabilities in identifying abnormal data within extensive datasets and struggle to accommodate the considerable variability and intricacy of ... Current methodologies for cleaning wind power anomaly data exhibit limited capabilities in identifying abnormal data within extensive datasets and struggle to accommodate the considerable variability and intricacy of wind farm data.Consequently,a method for cleaning wind power anomaly data by combining image processing with community detection algorithms(CWPAD-IPCDA)is proposed.To precisely identify and initially clean anomalous data,wind power curve(WPC)images are converted into graph structures,which employ the Louvain community recognition algorithm and graph-theoretic methods for community detection and segmentation.Furthermore,the mathematical morphology operation(MMO)determines the main part of the initially cleaned wind power curve images and maps them back to the normal wind power points to complete the final cleaning.The CWPAD-IPCDA method was applied to clean datasets from 25 wind turbines(WTs)in two wind farms in northwest China to validate its feasibility.A comparison was conducted using density-based spatial clustering of applications with noise(DBSCAN)algorithm,an improved isolation forest algorithm,and an image-based(IB)algorithm.The experimental results demonstrate that the CWPAD-IPCDA method surpasses the other three algorithms,achieving an approximately 7.23%higher average data cleaning rate.The mean value of the sum of the squared errors(SSE)of the dataset after cleaning is approximately 6.887 lower than that of the other algorithms.Moreover,the mean of overall accuracy,as measured by the F1-score,exceeds that of the other methods by approximately 10.49%;this indicates that the CWPAD-IPCDA method is more conducive to improving the accuracy and reliability of wind power curve modeling and wind farm power forecasting. 展开更多
关键词 Wind turbine power curve Abnormal data cleaning Community detection Louvain algorithm Mathematical morphology operation
下载PDF
An Innovative K-Anonymity Privacy-Preserving Algorithm to Improve Data Availability in the Context of Big Data
2
作者 Linlin Yuan Tiantian Zhang +2 位作者 Yuling Chen Yuxiang Yang Huang Li 《Computers, Materials & Continua》 SCIE EI 2024年第4期1561-1579,共19页
The development of technologies such as big data and blockchain has brought convenience to life,but at the same time,privacy and security issues are becoming more and more prominent.The K-anonymity algorithm is an eff... The development of technologies such as big data and blockchain has brought convenience to life,but at the same time,privacy and security issues are becoming more and more prominent.The K-anonymity algorithm is an effective and low computational complexity privacy-preserving algorithm that can safeguard users’privacy by anonymizing big data.However,the algorithm currently suffers from the problem of focusing only on improving user privacy while ignoring data availability.In addition,ignoring the impact of quasi-identified attributes on sensitive attributes causes the usability of the processed data on statistical analysis to be reduced.Based on this,we propose a new K-anonymity algorithm to solve the privacy security problem in the context of big data,while guaranteeing improved data usability.Specifically,we construct a new information loss function based on the information quantity theory.Considering that different quasi-identification attributes have different impacts on sensitive attributes,we set weights for each quasi-identification attribute when designing the information loss function.In addition,to reduce information loss,we improve K-anonymity in two ways.First,we make the loss of information smaller than in the original table while guaranteeing privacy based on common artificial intelligence algorithms,i.e.,greedy algorithm and 2-means clustering algorithm.In addition,we improve the 2-means clustering algorithm by designing a mean-center method to select the initial center of mass.Meanwhile,we design the K-anonymity algorithm of this scheme based on the constructed information loss function,the improved 2-means clustering algorithm,and the greedy algorithm,which reduces the information loss.Finally,we experimentally demonstrate the effectiveness of the algorithm in improving the effect of 2-means clustering and reducing information loss. 展开更多
关键词 Blockchain big data K-ANONYMITY 2-means clustering greedy algorithm mean-center method
下载PDF
A Study of EM Algorithm as an Imputation Method: A Model-Based Simulation Study with Application to a Synthetic Compositional Data
3
作者 Yisa Adeniyi Abolade Yichuan Zhao 《Open Journal of Modelling and Simulation》 2024年第2期33-42,共10页
Compositional data, such as relative information, is a crucial aspect of machine learning and other related fields. It is typically recorded as closed data or sums to a constant, like 100%. The statistical linear mode... Compositional data, such as relative information, is a crucial aspect of machine learning and other related fields. It is typically recorded as closed data or sums to a constant, like 100%. The statistical linear model is the most used technique for identifying hidden relationships between underlying random variables of interest. However, data quality is a significant challenge in machine learning, especially when missing data is present. The linear regression model is a commonly used statistical modeling technique used in various applications to find relationships between variables of interest. When estimating linear regression parameters which are useful for things like future prediction and partial effects analysis of independent variables, maximum likelihood estimation (MLE) is the method of choice. However, many datasets contain missing observations, which can lead to costly and time-consuming data recovery. To address this issue, the expectation-maximization (EM) algorithm has been suggested as a solution for situations including missing data. The EM algorithm repeatedly finds the best estimates of parameters in statistical models that depend on variables or data that have not been observed. This is called maximum likelihood or maximum a posteriori (MAP). Using the present estimate as input, the expectation (E) step constructs a log-likelihood function. Finding the parameters that maximize the anticipated log-likelihood, as determined in the E step, is the job of the maximization (M) phase. This study looked at how well the EM algorithm worked on a made-up compositional dataset with missing observations. It used both the robust least square version and ordinary least square regression techniques. The efficacy of the EM algorithm was compared with two alternative imputation techniques, k-Nearest Neighbor (k-NN) and mean imputation (), in terms of Aitchison distances and covariance. 展开更多
关键词 Compositional data Linear Regression Model Least Square Method Robust Least Square Method Synthetic data Aitchison Distance Maximum Likelihood Estimation Expectation-Maximization algorithm k-Nearest Neighbor and Mean imputation
下载PDF
Generating of Test Data by Harmony Search Against Genetic Algorithms
4
作者 Ahmed S.Ghiduk Abdullah Alharbi 《Intelligent Automation & Soft Computing》 SCIE 2023年第4期647-665,共19页
Many search-based algorithms have been successfully applied in sev-eral software engineering activities.Genetic algorithms(GAs)are the most used in the scientific domains by scholars to solve software testing problems.... Many search-based algorithms have been successfully applied in sev-eral software engineering activities.Genetic algorithms(GAs)are the most used in the scientific domains by scholars to solve software testing problems.They imi-tate the theory of natural selection and evolution.The harmony search algorithm(HSA)is one of the most recent search algorithms in the last years.It imitates the behavior of a musician tofind the best harmony.Scholars have estimated the simi-larities and the differences between genetic algorithms and the harmony search algorithm in diverse research domains.The test data generation process represents a critical task in software validation.Unfortunately,there is no work comparing the performance of genetic algorithms and the harmony search algorithm in the test data generation process.This paper studies the similarities and the differences between genetic algorithms and the harmony search algorithm based on the ability and speed offinding the required test data.The current research performs an empirical comparison of the HSA and the GAs,and then the significance of the results is estimated using the t-Test.The study investigates the efficiency of the harmony search algorithm and the genetic algorithms according to(1)the time performance,(2)the significance of the generated test data,and(3)the adequacy of the generated test data to satisfy a given testing criterion.The results showed that the harmony search algorithm is significantly faster than the genetic algo-rithms because the t-Test showed that the p-value of the time values is 0.026<α(αis the significance level=0.05 at 95%confidence level).In contrast,there is no significant difference between the two algorithms in generating the adequate test data because the t-Test showed that the p-value of thefitness values is 0.25>α. 展开更多
关键词 Harmony search algorithm genetic algorithms test data generation
下载PDF
Recommendation Algorithm Integrating CNN and Attention System in Data Extraction 被引量:1
5
作者 Yang Li Fei Yin Xianghui Hui 《Computers, Materials & Continua》 SCIE EI 2023年第5期4047-4063,共17页
With the rapid development of the Internet globally since the 21st century,the amount of data information has increased exponentially.Data helps improve people’s livelihood and working conditions,as well as learning ... With the rapid development of the Internet globally since the 21st century,the amount of data information has increased exponentially.Data helps improve people’s livelihood and working conditions,as well as learning efficiency.Therefore,data extraction,analysis,and processing have become a hot issue for people from all walks of life.Traditional recommendation algorithm still has some problems,such as inaccuracy,less diversity,and low performance.To solve these problems and improve the accuracy and variety of the recommendation algorithms,the research combines the convolutional neural networks(CNN)and the attention model to design a recommendation algorithm based on the neural network framework.Through the text convolutional network,the input layer in CNN has transformed into two channels:static ones and non-static ones.Meanwhile,the self-attention system focuses on the system so that data can be better processed and the accuracy of feature extraction becomes higher.The recommendation algorithm combines CNN and attention system and divides the embedding layer into user information feature embedding and data name feature extraction embedding.It obtains data name features through a convolution kernel.Finally,the top pooling layer obtains the length vector.The attention system layer obtains the characteristics of the data type.Experimental results show that the proposed recommendation algorithm that combines CNN and the attention system can perform better in data extraction than the traditional CNN algorithm and other recommendation algorithms that are popular at the present stage.The proposed algorithm shows excellent accuracy and robustness. 展开更多
关键词 data extraction recommendation algorithm CNN algorithm attention model
下载PDF
A study of multiyear ice concentration retrieval algorithms using AMSR-E data 被引量:3
6
作者 HAO Guanghua SU Jie 《Acta Oceanologica Sinica》 SCIE CAS CSCD 2015年第9期102-109,共8页
In recent years, the rapid decline of Arctic sea ice area (SIA) and sea ice extent (SIE), especially for the multiyear (MY) ice, has led to significant effect on climate change. The accurate retrieval of MY ice ... In recent years, the rapid decline of Arctic sea ice area (SIA) and sea ice extent (SIE), especially for the multiyear (MY) ice, has led to significant effect on climate change. The accurate retrieval of MY ice concentration retrieval is very important and challenging to understand the ongoing changes. Three MY ice concentration retrieval algorithms were systematically evaluated. A similar total ice concentration was yielded by these algorithms, while the retrieved MY sea ice concentrations differs from each other. The MY SIA derived from NASA TEAM algorithm is relatively stable. Other two algorithms created seasonal fluctuations of MY SIA, particularly in autumn and winter. In this paper, we proposed an ice concentration retrieval algorithm, which developed the NASA TEAM algorithm by adding to use AMSR-E 6.9 GHz brightness temperature data and sea ice concentration using 89.0 GHz data. Comparison with the reference MY SIA from reference MY ice, indicates that the mean difference and root mean square (rms) difference of MY SIA derived from the algorithm of this study are 0.65×10^6 km^2 and 0.69×10^6 km^2 during January to March, -0.06×10^6 km^2 and 0.14×10^6 km^2during September to December respectively. Comparison with MY SIE obtained from weekly ice age data provided by University of Colorado show that, the mean difference and rms difference are 0.69×10^6 km^2 and 0.84×10^6 km^2, respectively. The developed algorithm proposed in this study has smaller difference compared with the reference MY ice and MY SIE from ice age data than the Wang's, Lomax' and NASA TEAM algorithms. 展开更多
关键词 multiyear ice concentration retrieval algorithms AMSR-E data
下载PDF
An Imbalanced Data Classification Method Based on Hybrid Resampling and Fine Cost Sensitive Support Vector Machine 被引量:1
7
作者 Bo Zhu Xiaona Jing +1 位作者 Lan Qiu Runbo Li 《Computers, Materials & Continua》 SCIE EI 2024年第6期3977-3999,共23页
When building a classification model,the scenario where the samples of one class are significantly more than those of the other class is called data imbalance.Data imbalance causes the trained classification model to ... When building a classification model,the scenario where the samples of one class are significantly more than those of the other class is called data imbalance.Data imbalance causes the trained classification model to be in favor of the majority class(usually defined as the negative class),which may do harm to the accuracy of the minority class(usually defined as the positive class),and then lead to poor overall performance of the model.A method called MSHR-FCSSVM for solving imbalanced data classification is proposed in this article,which is based on a new hybrid resampling approach(MSHR)and a new fine cost-sensitive support vector machine(CS-SVM)classifier(FCSSVM).The MSHR measures the separability of each negative sample through its Silhouette value calculated by Mahalanobis distance between samples,based on which,the so-called pseudo-negative samples are screened out to generate new positive samples(over-sampling step)through linear interpolation and are deleted finally(under-sampling step).This approach replaces pseudo-negative samples with generated new positive samples one by one to clear up the inter-class overlap on the borderline,without changing the overall scale of the dataset.The FCSSVM is an improved version of the traditional CS-SVM.It considers influences of both the imbalance of sample number and the class distribution on classification simultaneously,and through finely tuning the class cost weights by using the efficient optimization algorithm based on the physical phenomenon of rime-ice(RIME)algorithm with cross-validation accuracy as the fitness function to accurately adjust the classification borderline.To verify the effectiveness of the proposed method,a series of experiments are carried out based on 20 imbalanced datasets including both mildly and extremely imbalanced datasets.The experimental results show that the MSHR-FCSSVM method performs better than the methods for comparison in most cases,and both the MSHR and the FCSSVM played significant roles. 展开更多
关键词 Imbalanced data classification Silhouette value Mahalanobis distance RIME algorithm CS-SVM
下载PDF
Product quality prediction based on RBF optimized by firefly algorithm 被引量:1
8
作者 HAN Huihui WANG Jian +1 位作者 CHEN Sen YAN Manting 《Journal of Systems Engineering and Electronics》 SCIE EI CSCD 2024年第1期105-117,共13页
With the development of information technology,a large number of product quality data in the entire manufacturing process is accumulated,but it is not explored and used effectively.The traditional product quality pred... With the development of information technology,a large number of product quality data in the entire manufacturing process is accumulated,but it is not explored and used effectively.The traditional product quality prediction models have many disadvantages,such as high complexity and low accuracy.To overcome the above problems,we propose an optimized data equalization method to pre-process dataset and design a simple but effective product quality prediction model:radial basis function model optimized by the firefly algorithm with Levy flight mechanism(RBFFALM).First,the new data equalization method is introduced to pre-process the dataset,which reduces the dimension of the data,removes redundant features,and improves the data distribution.Then the RBFFALFM is used to predict product quality.Comprehensive expe riments conducted on real-world product quality datasets validate that the new model RBFFALFM combining with the new data pre-processing method outperforms other previous me thods on predicting product quality. 展开更多
关键词 product quality prediction data pre-processing radial basis function swarm intelligence optimization algorithm
下载PDF
Algorithms for Pre-Compiling Programs by Parallel Compilers
9
作者 Fayez AlFayez 《Computer Systems Science & Engineering》 SCIE EI 2023年第3期2165-2176,共12页
The paper addresses the challenge of transmitting a big number offiles stored in a data center(DC),encrypting them by compilers,and sending them through a network at an acceptable time.Face to the big number offiles,o... The paper addresses the challenge of transmitting a big number offiles stored in a data center(DC),encrypting them by compilers,and sending them through a network at an acceptable time.Face to the big number offiles,only one compiler may not be sufficient to encrypt data in an acceptable time.In this paper,we consider the problem of several compilers and the objective is tofind an algorithm that can give an efficient schedule for the givenfiles to be compiled by the compilers.The main objective of the work is to minimize the gap in the total size of assignedfiles between compilers.This minimization ensures the fair distribution offiles to different compilers.This problem is considered to be a very hard problem.This paper presents two research axes.Thefirst axis is related to architecture.We propose a novel pre-compiler architecture in this context.The second axis is algorithmic development.We develop six algorithms to solve the problem,in this context.These algorithms are based on the dispatching rules method,decomposition method,and an iterative approach.These algorithms give approximate solutions for the studied problem.An experimental result is imple-mented to show the performance of algorithms.Several indicators are used to measure the performance of the proposed algorithms.In addition,five classes are proposed to test the algorithms with a total of 2350 instances.A comparison between the proposed algorithms is presented in different tables discussed to show the performance of each algorithm.The result showed that the best algorithm is the Iterative-mixed Smallest-Longest-Heuristic(ISL)with a percentage equal to 97.7%and an average running time equal to 0.148 s.All other algorithms did not exceed 22%as a percentage.The best algorithm excluding ISL is Iterative-mixed Longest-Smallest Heuristic(ILS)with a percentage equal to 21,4%and an average running time equal to 0.150 s. 展开更多
关键词 COMPILER ENCRYPTION SCHEDULING big data algorithms
下载PDF
Chimp Optimization Algorithm Based Feature Selection with Machine Learning for Medical Data Classification
10
作者 Firas Abedi Hayder M.A.Ghanimi +6 位作者 Abeer D.Algarni Naglaa F.Soliman Walid El-Shafai Ali Hashim Abbas Zahraa H.Kareem Hussein Muhi Hariz Ahmed Alkhayyat 《Computer Systems Science & Engineering》 SCIE EI 2023年第12期2791-2814,共24页
Datamining plays a crucial role in extractingmeaningful knowledge fromlarge-scale data repositories,such as data warehouses and databases.Association rule mining,a fundamental process in data mining,involves discoveri... Datamining plays a crucial role in extractingmeaningful knowledge fromlarge-scale data repositories,such as data warehouses and databases.Association rule mining,a fundamental process in data mining,involves discovering correlations,patterns,and causal structures within datasets.In the healthcare domain,association rules offer valuable opportunities for building knowledge bases,enabling intelligent diagnoses,and extracting invaluable information rapidly.This paper presents a novel approach called the Machine Learning based Association Rule Mining and Classification for Healthcare Data Management System(MLARMC-HDMS).The MLARMC-HDMS technique integrates classification and association rule mining(ARM)processes.Initially,the chimp optimization algorithm-based feature selection(COAFS)technique is employed within MLARMC-HDMS to select relevant attributes.Inspired by the foraging behavior of chimpanzees,the COA algorithm mimics their search strategy for food.Subsequently,the classification process utilizes stochastic gradient descent with a multilayer perceptron(SGD-MLP)model,while the Apriori algorithm determines attribute relationships.We propose a COA-based feature selection approach for medical data classification using machine learning techniques.This approach involves selecting pertinent features from medical datasets through COA and training machine learning models using the reduced feature set.We evaluate the performance of our approach on various medical datasets employing diverse machine learning classifiers.Experimental results demonstrate that our proposed approach surpasses alternative feature selection methods,achieving higher accuracy and precision rates in medical data classification tasks.The study showcases the effectiveness and efficiency of the COA-based feature selection approach in identifying relevant features,thereby enhancing the diagnosis and treatment of various diseases.To provide further validation,we conduct detailed experiments on a benchmark medical dataset,revealing the superiority of the MLARMCHDMS model over other methods,with a maximum accuracy of 99.75%.Therefore,this research contributes to the advancement of feature selection techniques in medical data classification and highlights the potential for improving healthcare outcomes through accurate and efficient data analysis.The presented MLARMC-HDMS framework and COA-based feature selection approach offer valuable insights for researchers and practitioners working in the field of healthcare data mining and machine learning. 展开更多
关键词 Association rule mining data classification healthcare data machine learning parameter tuning data mining feature selection MLARMC-HDMS COA stochastic gradient descent Apriori algorithm
下载PDF
Research on Data Routing Model Based on Ant Colony Algorithms 被引量:1
11
作者 龚跃 吴航 +2 位作者 鲍杰 王君军 张艳秋 《Defence Technology(防务技术)》 SCIE EI CAS 2010年第4期269-272,共4页
Improved traditional ant colony algorithms,a data routing model used to the data remote exchange on WAN was presented.In the model,random heuristic factors were introduced to realize multi-path search.The updating mod... Improved traditional ant colony algorithms,a data routing model used to the data remote exchange on WAN was presented.In the model,random heuristic factors were introduced to realize multi-path search.The updating model of pheromone could adjust the pheromone concentration on the optimal path according to path load dynamically to make the system keep load balance.The simulation results show that the improved model has a higher performance on convergence and load balance. 展开更多
关键词 computer software data transmission ant colony algorithm routing model
下载PDF
Research on a Fog Computing Architecture and BP Algorithm Application for Medical Big Data
12
作者 Baoling Qin 《Intelligent Automation & Soft Computing》 SCIE 2023年第7期255-267,共13页
Although the Internet of Things has been widely applied,the problems of cloud computing in the application of digital smart medical Big Data collection,processing,analysis,and storage remain,especially the low efficie... Although the Internet of Things has been widely applied,the problems of cloud computing in the application of digital smart medical Big Data collection,processing,analysis,and storage remain,especially the low efficiency of medical diagnosis.And with the wide application of the Internet of Things and Big Data in the medical field,medical Big Data is increasing in geometric magnitude resulting in cloud service overload,insufficient storage,communication delay,and network congestion.In order to solve these medical and network problems,a medical big-data-oriented fog computing architec-ture and BP algorithm application are proposed,and its structural advantages and characteristics are studied.This architecture enables the medical Big Data generated by medical edge devices and the existing data in the cloud service center to calculate,compare and analyze the fog node through the Internet of Things.The diagnosis results are designed to reduce the business processing delay and improve the diagnosis effect.Considering the weak computing of each edge device,the artificial intelligence BP neural network algorithm is used in the core computing model of the medical diagnosis system to improve the system computing power,enhance the medical intelligence-aided decision-making,and improve the clinical diagnosis and treatment efficiency.In the application process,combined with the characteristics of medical Big Data technology,through fog architecture design and Big Data technology integration,we could research the processing and analysis of heterogeneous data of the medical diagnosis system in the context of the Internet of Things.The results are promising:The medical platform network is smooth,the data storage space is sufficient,the data processing and analysis speed is fast,the diagnosis effect is remarkable,and it is a good assistant to doctors’treatment effect.It not only effectively solves the problem of low clinical diagnosis,treatment efficiency and quality,but also reduces the waiting time of patients,effectively solves the contradiction between doctors and patients,and improves the medical service quality and management level. 展开更多
关键词 Medical big data IOT fog computing distributed computing BP algorithm model
下载PDF
Using machine learning algorithms to estimate stand volume growth of Larix and Quercus forests based on national-scale Forest Inventory data in China 被引量:2
13
作者 Huiling Tian Jianhua Zhu +8 位作者 Xiao He Xinyun Chen Zunji Jian Chenyu Li Qiangxin Ou Qi Li Guosheng Huang Changfu Liu Wenfa Xiao 《Forest Ecosystems》 SCIE CSCD 2022年第3期396-406,共11页
Estimating the volume growth of forest ecosystems accurately is important for understanding carbon sequestration and achieving carbon neutrality goals.However,the key environmental factors affecting volume growth diff... Estimating the volume growth of forest ecosystems accurately is important for understanding carbon sequestration and achieving carbon neutrality goals.However,the key environmental factors affecting volume growth differ across various scales and plant functional types.This study was,therefore,conducted to estimate the volume growth of Larix and Quercus forests based on national-scale forestry inventory data in China and its influencing factors using random forest algorithms.The results showed that the model performances of volume growth in natural forests(R^(2)=0.65 for Larix and 0.66 for Quercus,respectively)were better than those in planted forests(R^(2)=0.44 for Larix and 0.40 for Quercus,respectively).In both natural and planted forests,the stand age showed a strong relative importance for volume growth(8.6%–66.2%),while the edaphic and climatic variables had a limited relative importance(<6.0%).The relationship between stand age and volume growth was unimodal in natural forests and linear increase in planted Quercus forests.And the specific locations(i.e.,altitude and aspect)of sampling plots exhibited high relative importance for volume growth in planted forests(4.1%–18.2%).Altitude positively affected volume growth in planted Larix forests but controlled volume growth negatively in planted Quercus forests.Similarly,the effects of other environmental factors on volume growth also differed in both stand origins(planted versus natural)and plant functional types(Larix versus Quercus).These results highlighted that the stand age was the most important predictor for volume growth and there were diverse effects of environmental factors on volume growth among stand origins and plant functional types.Our findings will provide a good framework for site-specific recommendations regarding the management practices necessary to maintain the volume growth in China's forest ecosystems. 展开更多
关键词 Stand volume growth Stand origin Plant functional type National forest inventory data Random forest algorithms
下载PDF
Emotion Deduction from Social Media Text Data Using Machine Learning Algorithm
14
作者 Thambusamy Velmurugan Baskaran Jayapradha 《Journal of Computer and Communications》 2023年第11期183-196,共14页
Emotion represents the feeling of an individual in a given situation. There are various ways to express the emotions of an individual. It can be categorized into verbal expressions, written expressions, facial express... Emotion represents the feeling of an individual in a given situation. There are various ways to express the emotions of an individual. It can be categorized into verbal expressions, written expressions, facial expressions and gestures. Among these various ways of expressing the emotion, the written method is a challenging task to extract the emotions, as the data is in the form of textual dat. Finding the different kinds of emotions is also a tedious task as it requires a lot of pre preparations of the textual data taken for the research. This research work is carried out to analyse and extract the emotions hidden in text data. The text data taken for the analysis is from the social media dataset. Using the raw text data directly from the social media will not serve the purpose. Therefore, the text data has to be pre-processed and then utilised for further processing. Pre-processing makes the text data more efficient and would infer valuable insights of the emotions hidden in it. The preprocessing steps also help to manage the text data for identifying the emotions conveyed in the text. This work proposes to deduct the emotions taken from the social media text data by applying the machine learning algorithm. Finally, the usefulness of the emotions is suggested for various stake holders, to find the attitude of individuals at that moment, the data is produced. . 展开更多
关键词 data Pre-Processing Machine Learning algorithms Emotion Deduction Sentiment Analysis
下载PDF
A Short Review of Classification Algorithms Accuracy for Data Prediction in Data Mining Applications 被引量:1
15
作者 Ibrahim Ba’abbad Thamer Althubiti +2 位作者 Abdulmohsen Alharbi Khalid Alfarsi Saim Rasheed 《Journal of Data Analysis and Information Processing》 2021年第3期162-174,共13页
Many business applications rely on their historical data to predict their business future. The marketing products process is one of the core processes for the business. Customer needs give a useful piece of informatio... Many business applications rely on their historical data to predict their business future. The marketing products process is one of the core processes for the business. Customer needs give a useful piece of information that help</span><span style="font-family:Verdana;"><span style="font-family:Verdana;">s</span></span><span style="font-family:Verdana;"> to market the appropriate products at the appropriate time. Moreover, services are considered recently as products. The development of education and health services </span><span style="font-family:Verdana;"><span style="font-family:Verdana;">is</span></span><span style="font-family:Verdana;"> depending on historical data. For the more, reducing online social media networks problems and crimes need a significant source of information. Data analysts need to use an efficient classification algorithm to predict the future of such businesses. However, dealing with a huge quantity of data requires great time to process. Data mining involves many useful techniques that are used to predict statistical data in a variety of business applications. The classification technique is one of the most widely used with a variety of algorithms. In this paper, various classification algorithms are revised in terms of accuracy in different areas of data mining applications. A comprehensive analysis is made after delegated reading of 20 papers in the literature. This paper aims to help data analysts to choose the most suitable classification algorithm for different business applications including business in general, online social media networks, agriculture, health, and education. Results show FFBPN is the most accurate algorithm in the business domain. The Random Forest algorithm is the most accurate in classifying online social networks (OSN) activities. Na<span style="white-space:nowrap;">&#239</span>ve Bayes algorithm is the most accurate to classify agriculture datasets. OneR is the most accurate algorithm to classify instances within the health domain. The C4.5 Decision Tree algorithm is the most accurate to classify students’ records to predict degree completion time. 展开更多
关键词 data Prediction Techniques ACCURACY Classification algorithms data Mining Applications
下载PDF
Prediction of Lubricant Physicochemical Properties Based on Gaussian Copula Data Expansion
16
作者 Feng Xin Yang Rui +1 位作者 Xie Peiyuan Xia Yanqiu 《China Petroleum Processing & Petrochemical Technology》 SCIE CAS CSCD 2024年第1期161-174,共14页
The composition of base oils affects the performance of lubricants made from them.This paper proposes a hybrid model based on gradient-boosted decision tree(GBDT)to analyze the effect of different ratios of KN4010,PAO... The composition of base oils affects the performance of lubricants made from them.This paper proposes a hybrid model based on gradient-boosted decision tree(GBDT)to analyze the effect of different ratios of KN4010,PAO40,and PriEco3000 component in a composite base oil system on the performance of lubricants.The study was conducted under small laboratory sample conditions,and a data expansion method using the Gaussian Copula function was proposed to improve the prediction ability of the hybrid model.The study also compared four optimization algorithms,sticky mushroom algorithm(SMA),genetic algorithm(GA),whale optimization algorithm(WOA),and seagull optimization algorithm(SOA),to predict the kinematic viscosity at 40℃,kinematic viscosity at 100℃,viscosity index,and oxidation induction time performance of the lubricant.The results showed that the Gaussian Copula function data expansion method improved the prediction ability of the hybrid model in the case of small samples.The SOA-GBDT hybrid model had the fastest convergence speed for the samples and the best prediction effect,with determination coefficients(R^(2))for the four indicators of lubricants reaching 0.98,0.99,0.96 and 0.96,respectively.Thus,this model can significantly reduce the model’s prediction error and has good prediction ability. 展开更多
关键词 base oil data augmentation machine learning performance prediction seagull algorithm
下载PDF
Evaluating the Efficacy of Latent Variables in Mitigating Data Poisoning Attacks in the Context of Bayesian Networks:An Empirical Study
17
作者 Shahad Alzahrani Hatim Alsuwat Emad Alsuwat 《Computer Modeling in Engineering & Sciences》 SCIE EI 2024年第5期1635-1654,共20页
Bayesian networks are a powerful class of graphical decision models used to represent causal relationships among variables.However,the reliability and integrity of learned Bayesian network models are highly dependent ... Bayesian networks are a powerful class of graphical decision models used to represent causal relationships among variables.However,the reliability and integrity of learned Bayesian network models are highly dependent on the quality of incoming data streams.One of the primary challenges with Bayesian networks is their vulnerability to adversarial data poisoning attacks,wherein malicious data is injected into the training dataset to negatively influence the Bayesian network models and impair their performance.In this research paper,we propose an efficient framework for detecting data poisoning attacks against Bayesian network structure learning algorithms.Our framework utilizes latent variables to quantify the amount of belief between every two nodes in each causal model over time.We use our innovative methodology to tackle an important issue with data poisoning assaults in the context of Bayesian networks.With regard to four different forms of data poisoning attacks,we specifically aim to strengthen the security and dependability of Bayesian network structure learning techniques,such as the PC algorithm.By doing this,we explore the complexity of this area and offer workablemethods for identifying and reducing these sneaky dangers.Additionally,our research investigates one particular use case,the“Visit to Asia Network.”The practical consequences of using uncertainty as a way to spot cases of data poisoning are explored in this inquiry,which is of utmost relevance.Our results demonstrate the promising efficacy of latent variables in detecting and mitigating the threat of data poisoning attacks.Additionally,our proposed latent-based framework proves to be sensitive in detecting malicious data poisoning attacks in the context of stream data. 展开更多
关键词 Bayesian networks data poisoning attacks latent variables structure learning algorithms adversarial attacks
下载PDF
Data-Driven Collaborative Scheduling Method for Multi-Satellite Data-Transmission
18
作者 Xiaoyu Chen Weichao Gu +5 位作者 Guangming Dai Lining Xing Tian Tian Weilai Luo Shi Cheng Mengyun Zhou 《Tsinghua Science and Technology》 SCIE EI CAS CSCD 2024年第5期1463-1480,共18页
With continuous expansion of satellite applications,the requirements for satellite communication services,such as communication delay,transmission bandwidth,transmission power consumption,and communication coverage,ar... With continuous expansion of satellite applications,the requirements for satellite communication services,such as communication delay,transmission bandwidth,transmission power consumption,and communication coverage,are becoming higher.This paper first presents an overview of the current development status of Low Earth Orbit(LEO)satellite constellations,and then conducts a demand analysis for multi-satellite data transmission based on LEO satellite constellations.The problem is described,and the challenges and difficulties of the problem are analyzed accordingly.On this basis,a multi-satellite data-transmission mathematical model is then constructed.Combining classical heuristic allocating strategies on the features of the proposed model,with the reinforcement learning algorithm Deep Q-Network(DQN),a two-stage optimization framework based on heuristic and DQN is proposed.Finally,by taking into account the spatial and temporal distribution characteristics of satellite and facility resources,a multi-satellite scheduling instance dataset is generated.Experimental results validate the rationality and correctness of the DQN algorithm in solving the collaborative scheduling problem of multi-satellite data transmission. 展开更多
关键词 relay satellite SCHEDULING data transmission Deep Q-Network(DQN) Genetic algorithm(GA)
原文传递
Research on Total Electric Field Prediction Method of Ultra-High Voltage Direct Current Transmission Line Based on Stacking Algorithm
19
作者 Yinkong Wei Mucong Wu +3 位作者 Wei Wei Paulo R.F.Rocha Ziyi Cheng Weifang Yao 《Computer Systems Science & Engineering》 2024年第3期723-738,共16页
Ultra-high voltage(UHV)transmission lines are an important part of China’s power grid and are often surrounded by a complex electromagnetic environment.The ground total electric field is considered a main electromagn... Ultra-high voltage(UHV)transmission lines are an important part of China’s power grid and are often surrounded by a complex electromagnetic environment.The ground total electric field is considered a main electromagnetic environment indicator of UHV transmission lines and is currently employed for reliable long-term operation of the power grid.Yet,the accurate prediction of the ground total electric field remains a technical challenge.In this work,we collected the total electric field data from the Ningdong-Zhejiang±800 kV UHVDC transmission project,as of the Ling Shao line,and perform an outlier analysis of the total electric field data.We show that the Local Outlier Factor(LOF)elimination algorithm has a small average difference and overcomes the performance of Density-Based Spatial Clustering of Applications with Noise(DBSCAN)and Isolated Forest elimination algorithms.Moreover,the Stacking algorithm has been found to have superior prediction accuracy than a variety of similar prediction algorithms,including the traditional finite element.The low prediction error of the Stacking algorithm highlights the superior ability to accurately forecast the ground total electric field of UHVDC transmission lines. 展开更多
关键词 DC transmission line total electric field effective data multivariable outliers LOF algorithm Stacking algorithm
下载PDF
Multi-Label Feature Selection Based on Improved Ant Colony Optimization Algorithm with Dynamic Redundancy and Label Dependence
20
作者 Ting Cai Chun Ye +5 位作者 Zhiwei Ye Ziyuan Chen Mengqing Mei Haichao Zhang Wanfang Bai Peng Zhang 《Computers, Materials & Continua》 SCIE EI 2024年第10期1157-1175,共19页
The world produces vast quantities of high-dimensional multi-semantic data.However,extracting valuable information from such a large amount of high-dimensional and multi-label data is undoubtedly arduous and challengi... The world produces vast quantities of high-dimensional multi-semantic data.However,extracting valuable information from such a large amount of high-dimensional and multi-label data is undoubtedly arduous and challenging.Feature selection aims to mitigate the adverse impacts of high dimensionality in multi-label data by eliminating redundant and irrelevant features.The ant colony optimization algorithm has demonstrated encouraging outcomes in multi-label feature selection,because of its simplicity,efficiency,and similarity to reinforcement learning.Nevertheless,existing methods do not consider crucial correlation information,such as dynamic redundancy and label correlation.To tackle these concerns,the paper proposes a multi-label feature selection technique based on ant colony optimization algorithm(MFACO),focusing on dynamic redundancy and label correlation.Initially,the dynamic redundancy is assessed between the selected feature subset and potential features.Meanwhile,the ant colony optimization algorithm extracts label correlation from the label set,which is then combined into the heuristic factor as label weights.Experimental results demonstrate that our proposed strategies can effectively enhance the optimal search ability of ant colony,outperforming the other algorithms involved in the paper. 展开更多
关键词 Multi-label feature selection ant colony optimization algorithm dynamic redundancy high-dimensional data label correlation
下载PDF
上一页 1 2 250 下一页 到第
使用帮助 返回顶部