Distribution networks denote important public infrastructure necessary for people’s livelihoods.However,extreme natural disasters,such as earthquakes,typhoons,and mudslides,severely threaten the safe and stable opera...Distribution networks denote important public infrastructure necessary for people’s livelihoods.However,extreme natural disasters,such as earthquakes,typhoons,and mudslides,severely threaten the safe and stable operation of distribution networks and power supplies needed for daily life.Therefore,considering the requirements for distribution network disaster prevention and mitigation,there is an urgent need for in-depth research on risk assessment methods of distribution networks under extreme natural disaster conditions.This paper accessesmultisource data,presents the data quality improvement methods of distribution networks,and conducts data-driven active fault diagnosis and disaster damage analysis and evaluation using data-driven theory.Furthermore,the paper realizes real-time,accurate access to distribution network disaster information.The proposed approach performs an accurate and rapid assessment of cross-sectional risk through case study.The minimal average annual outage time can be reduced to 3 h/a in the ring network through case study.The approach proposed in this paper can provide technical support to the further improvement of the ability of distribution networks to cope with extreme natural disasters.展开更多
Hashing technology has the advantages of reducing data storage and improving the efficiency of the learning system,making it more and more widely used in image retrieval.Multi-view data describes image information mor...Hashing technology has the advantages of reducing data storage and improving the efficiency of the learning system,making it more and more widely used in image retrieval.Multi-view data describes image information more comprehensively than traditional methods using a single-view.How to use hashing to combine multi-view data for image retrieval is still a challenge.In this paper,a multi-view fusion hashing method based on RKCCA(Random Kernel Canonical Correlation Analysis)is proposed.In order to describe image content more accurately,we use deep learning dense convolutional network feature DenseNet to construct multi-view by combining GIST feature or BoW_SIFT(Bag-of-Words model+SIFT feature)feature.This algorithm uses RKCCA method to fuse multi-view features to construct association features and apply them to image retrieval.The algorithm generates binary hash code with minimal distortion error by designing quantization regularization terms.A large number of experiments on benchmark datasets show that this method is superior to other multi-view hashing methods.展开更多
Educational Data Mining(EDM)is an emergent discipline that concen-trates on the design of self-learning and adaptive approaches.Higher education institutions have started to utilize analytical tools to improve student...Educational Data Mining(EDM)is an emergent discipline that concen-trates on the design of self-learning and adaptive approaches.Higher education institutions have started to utilize analytical tools to improve students’grades and retention.Prediction of students’performance is a difficult process owing to the massive quantity of educational data.Therefore,Artificial Intelligence(AI)techniques can be used for educational data mining in a big data environ-ment.At the same time,in EDM,the feature selection process becomes necessary in creation of feature subsets.Since the feature selection performance affects the predictive performance of any model,it is important to elaborately investigate the outcome of students’performance model related to the feature selection techni-ques.With this motivation,this paper presents a new Metaheuristic Optimiza-tion-based Feature Subset Selection with an Optimal Deep Learning model(MOFSS-ODL)for predicting students’performance.In addition,the proposed model uses an isolation forest-based outlier detection approach to eliminate the existence of outliers.Besides,the Chaotic Monarch Butterfly Optimization Algo-rithm(CBOA)is used for the selection of highly related features with low com-plexity and high performance.Then,a sailfish optimizer with stacked sparse autoencoder(SFO-SSAE)approach is utilized for the classification of educational data.The MOFSS-ODL model is tested against a benchmark student’s perfor-mance data set from the UCI repository.A wide-ranging simulation analysis por-trayed the improved predictive performance of the MOFSS-ODL technique over recent approaches in terms of different measures.Compared to other methods,experimental results prove that the proposed(MOFSS-ODL)classification model does a great job of predicting students’academic progress,with an accuracy of 96.49%.展开更多
The epidemic characters of Omicron(e.g.large-scale transmission)are significantly different from the initial variants of COVID-19.The data generated by large-scale transmission is important to predict the trend of epi...The epidemic characters of Omicron(e.g.large-scale transmission)are significantly different from the initial variants of COVID-19.The data generated by large-scale transmission is important to predict the trend of epidemic characters.However,the re-sults of current prediction models are inaccurate since they are not closely combined with the actual situation of Omicron transmission.In consequence,these inaccurate results have negative impacts on the process of the manufacturing and the service industry,for example,the production of masks and the recovery of the tourism industry.The authors have studied the epidemic characters in two ways,that is,investigation and prediction.First,a large amount of data is collected by utilising the Baidu index and conduct questionnaire survey concerning epidemic characters.Second,theβ-SEIDR model is established,where the population is classified as Susceptible,Exposed,Infected,Dead andβ-Recovered persons,to intelligently predict the epidemic characters of COVID-19.Note thatβ-Recovered persons denote that the Recovered persons may become Sus-ceptible persons with probabilityβ.The simulation results show that the model can accurately predict the epidemic characters.展开更多
The classification of functional data has drawn much attention in recent years.The main challenge is representing infinite-dimensional functional data by finite-dimensional features while utilizing those features to a...The classification of functional data has drawn much attention in recent years.The main challenge is representing infinite-dimensional functional data by finite-dimensional features while utilizing those features to achieve better classification accuracy.In this paper,we propose a mean-variance-based(MV)feature weighting method for classifying functional data or functional curves.In the feature extraction stage,each sample curve is approximated by B-splines to transfer features to the coefficients of the spline basis.After that,a feature weighting approach based on statistical principles is introduced by comprehensively considering the between-class differences and within-class variations of the coefficients.We also introduce a scaling parameter to adjust the gap between the weights of features.The new feature weighting approach can adaptively enhance noteworthy local features while mitigating the impact of confusing features.The algorithms for feature weighted K-nearest neighbor and support vector machine classifiers are both provided.Moreover,the new approach can be well integrated into existing functional data classifiers,such as the generalized functional linear model and functional linear discriminant analysis,resulting in a more accurate classification.The performance of the mean-variance-based classifiers is evaluated by simulation studies and real data.The results show that the newfeatureweighting approach significantly improves the classification accuracy for complex functional data.展开更多
With the increasing variety of application software of meteorological satellite ground system, how to provide reasonable hardware resources and improve the efficiency of software is paid more and more attention. In th...With the increasing variety of application software of meteorological satellite ground system, how to provide reasonable hardware resources and improve the efficiency of software is paid more and more attention. In this paper, a set of software classification method based on software operating characteristics is proposed. The method uses software run-time resource consumption to describe the software running characteristics. Firstly, principal component analysis (PCA) is used to reduce the dimension of software running feature data and to interpret software characteristic information. Then the modified K-means algorithm was used to classify the meteorological data processing software. Finally, it combined with the results of principal component analysis to explain the significance of various types of integrated software operating characteristics. And it is used as the basis for optimizing the allocation of software hardware resources and improving the efficiency of software operation.展开更多
Through analyzing the collected samples which are from three Chinese-English learners in ICLE project(Portsmouth Chinese-English learner corpus),this analysis project aims to describe the grammatical status of some no...Through analyzing the collected samples which are from three Chinese-English learners in ICLE project(Portsmouth Chinese-English learner corpus),this analysis project aims to describe the grammatical status of some non-native features in Chinese students’ writing and answer the following two questions:①Do these features seem to be performance mistakes(i.e.are they random) or is there evidence that they reflect an interlanguage grammar(ILG)(i.e.where they appear to be systematic errors)?②In the case of systematic errors,do they seem to be errors transferred from the L1(first language of the students) or do they seem to be developmental errors(shared by learners from other L1 backgrounds)?展开更多
Conventional observation data,precipitation data from regional automatic stations,1°×1° NCEP reanalysis data and TBB pictures of FY-2C geostationary meteorological satellite as well as Doppler radar,etc...Conventional observation data,precipitation data from regional automatic stations,1°×1° NCEP reanalysis data and TBB pictures of FY-2C geostationary meteorological satellite as well as Doppler radar,etc.were utilized to analyzing the heavy precipitation process in Hunan Province from June 8 to 10.The results indicated that this heavy precipitation process was caused under the condition of western Pacific subtropical high jumped northward and fell southward rapidly,maintained and swung the shear line of low and middle-level atmosphere over long periods,and configurated temperature-moisture energy.Through analysis we found that precipitation period and precipitation area had a good corresponding to radar product and satellite TBB image,the high potential pseudo-equivalent temperature(θse) of low level and high convergence available potential energy(CAPE) area as well as ascending area of strong convergence.With the extension of effective forecasted period,the forecast location of T639 and EC on the western ridge points of western Pacific subtropical high became more and more easterly and the intensity became weaker and weaker,which had some deviations for forecasting heavy precipitation area.展开更多
As big data,its technologies,and application continue to advance,the Smart Grid(SG)has become one of the most successful pervasive and fixed computing platforms that efficiently uses a data-driven approach and employs...As big data,its technologies,and application continue to advance,the Smart Grid(SG)has become one of the most successful pervasive and fixed computing platforms that efficiently uses a data-driven approach and employs efficient information and communication technology(ICT)and cloud computing.As a result of the complicated architecture of cloud computing,the distinctive working of advanced metering infrastructures(AMI),and the use of sensitive data,it has become challenging tomake the SG secure.Faults of the SG are categorized into two main categories,Technical Losses(TLs)and Non-Technical Losses(NTLs).Hardware failure,communication issues,ohmic losses,and energy burnout during transmission and propagation of energy are TLs.NTL’s are human-induced errors for malicious purposes such as attacking sensitive data and electricity theft,along with tampering with AMI for bill reduction by fraudulent customers.This research proposes a data-driven methodology based on principles of computational intelligence as well as big data analysis to identify fraudulent customers based on their load profile.In our proposed methodology,a hybrid Genetic Algorithm and Support Vector Machine(GA-SVM)model has been used to extract the relevant subset of feature data from a large and unsupervised public smart grid project dataset in London,UK,for theft detection.A subset of 26 out of 71 features is obtained with a classification accuracy of 96.6%,compared to studies conducted on small and limited datasets.展开更多
The rising popularity of online social networks (OSNs), such as Twitter, Facebook, MySpace, and LinkedIn, in recent years has sparked great interest in sentiment analysis on their data. While many methods exist for id...The rising popularity of online social networks (OSNs), such as Twitter, Facebook, MySpace, and LinkedIn, in recent years has sparked great interest in sentiment analysis on their data. While many methods exist for identifying sentiment in OSNs such as communication pattern mining and classification based on emoticon and parts of speech, the majority of them utilize a suboptimal batch mode learning approach when analyzing a large amount of real time data. As an alternative we present a stream algorithm using Modified Balanced Winnow for sentiment analysis on OSNs. Tested on three real-world network datasets, the performance of our sentiment predictions is close to that of batch learning with the ability to detect important features dynamically for sentiment analysis in data streams. These top features reveal key words important to the analysis of sentiment.展开更多
This paper presents a reasonable gridding-parameters extraction method for setting the optimal interpolation nodes in the gridding of scattered observed data. The method can extract optimized gridding parameters based...This paper presents a reasonable gridding-parameters extraction method for setting the optimal interpolation nodes in the gridding of scattered observed data. The method can extract optimized gridding parameters based on the distribution of features in raw data. Modeling analysis proves that distortion caused by gridding can be greatly reduced when using such parameters. We also present some improved technical measures that use human- machine interaction and multi-thread parallel technology to solve inadequacies in traditional gridding software. On the basis of these methods, we have developed software that can be used to grid scattered data using a graphic interface. Finally, a comparison of different gridding parameters on field magnetic data from Ji Lin Province, North China demonstrates the superiority of the proposed method in eliminating the distortions and enhancing gridding efficiency.展开更多
Automatic road detection, in dense urban areas, is a challenging application in the remote sensing community. This is mainly because of physical and geometrical variations of road pixels, their spectral similarity to ...Automatic road detection, in dense urban areas, is a challenging application in the remote sensing community. This is mainly because of physical and geometrical variations of road pixels, their spectral similarity to other features such as buildings, parking lots and sidewalks, and the obstruction by vehicles and trees. These problems are real obstacles in precise detection and identification of urban roads from high-resolution satellite imagery. One of the promising strategies to deal with this problem is using multi-sensors data to reduce the uncertainties of detection. In this paper, an integrated object-based analysis framework was developed for detecting and extracting various types of urban roads from high-resolution optical images and Lidar data. The proposed method is designed and implemented using a rule-oriented approach based on a masking strategy. The overall accuracy (OA) of the final road map was 89.2%, and the kappa coefficient of agreement was 0.83, which show the efficiency and performance of the method in different conditions and interclass noises. The results also demonstrate the high capability of this object-based method in simultaneous identification of a wide variety of road elements in complex urban areas using both high-resolution satellite images and Lidar data.展开更多
文摘Distribution networks denote important public infrastructure necessary for people’s livelihoods.However,extreme natural disasters,such as earthquakes,typhoons,and mudslides,severely threaten the safe and stable operation of distribution networks and power supplies needed for daily life.Therefore,considering the requirements for distribution network disaster prevention and mitigation,there is an urgent need for in-depth research on risk assessment methods of distribution networks under extreme natural disaster conditions.This paper accessesmultisource data,presents the data quality improvement methods of distribution networks,and conducts data-driven active fault diagnosis and disaster damage analysis and evaluation using data-driven theory.Furthermore,the paper realizes real-time,accurate access to distribution network disaster information.The proposed approach performs an accurate and rapid assessment of cross-sectional risk through case study.The minimal average annual outage time can be reduced to 3 h/a in the ring network through case study.The approach proposed in this paper can provide technical support to the further improvement of the ability of distribution networks to cope with extreme natural disasters.
基金This work is supported by the National Natural Science Foundation of China(No.61772561)the Key Research&Development Plan of Hunan Province(No.2018NK2012)+1 种基金the Science Research Projects of Hunan Provincial Education Department(Nos.18A174,18C0262)the Science&Technology Innovation Platform and Talent Plan of Hunan Province(2017TP1022).
文摘Hashing technology has the advantages of reducing data storage and improving the efficiency of the learning system,making it more and more widely used in image retrieval.Multi-view data describes image information more comprehensively than traditional methods using a single-view.How to use hashing to combine multi-view data for image retrieval is still a challenge.In this paper,a multi-view fusion hashing method based on RKCCA(Random Kernel Canonical Correlation Analysis)is proposed.In order to describe image content more accurately,we use deep learning dense convolutional network feature DenseNet to construct multi-view by combining GIST feature or BoW_SIFT(Bag-of-Words model+SIFT feature)feature.This algorithm uses RKCCA method to fuse multi-view features to construct association features and apply them to image retrieval.The algorithm generates binary hash code with minimal distortion error by designing quantization regularization terms.A large number of experiments on benchmark datasets show that this method is superior to other multi-view hashing methods.
文摘Educational Data Mining(EDM)is an emergent discipline that concen-trates on the design of self-learning and adaptive approaches.Higher education institutions have started to utilize analytical tools to improve students’grades and retention.Prediction of students’performance is a difficult process owing to the massive quantity of educational data.Therefore,Artificial Intelligence(AI)techniques can be used for educational data mining in a big data environ-ment.At the same time,in EDM,the feature selection process becomes necessary in creation of feature subsets.Since the feature selection performance affects the predictive performance of any model,it is important to elaborately investigate the outcome of students’performance model related to the feature selection techni-ques.With this motivation,this paper presents a new Metaheuristic Optimiza-tion-based Feature Subset Selection with an Optimal Deep Learning model(MOFSS-ODL)for predicting students’performance.In addition,the proposed model uses an isolation forest-based outlier detection approach to eliminate the existence of outliers.Besides,the Chaotic Monarch Butterfly Optimization Algo-rithm(CBOA)is used for the selection of highly related features with low com-plexity and high performance.Then,a sailfish optimizer with stacked sparse autoencoder(SFO-SSAE)approach is utilized for the classification of educational data.The MOFSS-ODL model is tested against a benchmark student’s perfor-mance data set from the UCI repository.A wide-ranging simulation analysis por-trayed the improved predictive performance of the MOFSS-ODL technique over recent approaches in terms of different measures.Compared to other methods,experimental results prove that the proposed(MOFSS-ODL)classification model does a great job of predicting students’academic progress,with an accuracy of 96.49%.
基金Key discipline construction project for traditional Chinese Medicine in Guangdong province,Grant/Award Number:20220104The construction project of inheritance studio of national famous and old traditional Chinese Medicine experts,Grant/Award Number:140000020132。
文摘The epidemic characters of Omicron(e.g.large-scale transmission)are significantly different from the initial variants of COVID-19.The data generated by large-scale transmission is important to predict the trend of epidemic characters.However,the re-sults of current prediction models are inaccurate since they are not closely combined with the actual situation of Omicron transmission.In consequence,these inaccurate results have negative impacts on the process of the manufacturing and the service industry,for example,the production of masks and the recovery of the tourism industry.The authors have studied the epidemic characters in two ways,that is,investigation and prediction.First,a large amount of data is collected by utilising the Baidu index and conduct questionnaire survey concerning epidemic characters.Second,theβ-SEIDR model is established,where the population is classified as Susceptible,Exposed,Infected,Dead andβ-Recovered persons,to intelligently predict the epidemic characters of COVID-19.Note thatβ-Recovered persons denote that the Recovered persons may become Sus-ceptible persons with probabilityβ.The simulation results show that the model can accurately predict the epidemic characters.
基金the National Social Science Foundation of China(Grant No.22BTJ035).
文摘The classification of functional data has drawn much attention in recent years.The main challenge is representing infinite-dimensional functional data by finite-dimensional features while utilizing those features to achieve better classification accuracy.In this paper,we propose a mean-variance-based(MV)feature weighting method for classifying functional data or functional curves.In the feature extraction stage,each sample curve is approximated by B-splines to transfer features to the coefficients of the spline basis.After that,a feature weighting approach based on statistical principles is introduced by comprehensively considering the between-class differences and within-class variations of the coefficients.We also introduce a scaling parameter to adjust the gap between the weights of features.The new feature weighting approach can adaptively enhance noteworthy local features while mitigating the impact of confusing features.The algorithms for feature weighted K-nearest neighbor and support vector machine classifiers are both provided.Moreover,the new approach can be well integrated into existing functional data classifiers,such as the generalized functional linear model and functional linear discriminant analysis,resulting in a more accurate classification.The performance of the mean-variance-based classifiers is evaluated by simulation studies and real data.The results show that the newfeatureweighting approach significantly improves the classification accuracy for complex functional data.
文摘With the increasing variety of application software of meteorological satellite ground system, how to provide reasonable hardware resources and improve the efficiency of software is paid more and more attention. In this paper, a set of software classification method based on software operating characteristics is proposed. The method uses software run-time resource consumption to describe the software running characteristics. Firstly, principal component analysis (PCA) is used to reduce the dimension of software running feature data and to interpret software characteristic information. Then the modified K-means algorithm was used to classify the meteorological data processing software. Finally, it combined with the results of principal component analysis to explain the significance of various types of integrated software operating characteristics. And it is used as the basis for optimizing the allocation of software hardware resources and improving the efficiency of software operation.
文摘Through analyzing the collected samples which are from three Chinese-English learners in ICLE project(Portsmouth Chinese-English learner corpus),this analysis project aims to describe the grammatical status of some non-native features in Chinese students’ writing and answer the following two questions:①Do these features seem to be performance mistakes(i.e.are they random) or is there evidence that they reflect an interlanguage grammar(ILG)(i.e.where they appear to be systematic errors)?②In the case of systematic errors,do they seem to be errors transferred from the L1(first language of the students) or do they seem to be developmental errors(shared by learners from other L1 backgrounds)?
文摘Conventional observation data,precipitation data from regional automatic stations,1°×1° NCEP reanalysis data and TBB pictures of FY-2C geostationary meteorological satellite as well as Doppler radar,etc.were utilized to analyzing the heavy precipitation process in Hunan Province from June 8 to 10.The results indicated that this heavy precipitation process was caused under the condition of western Pacific subtropical high jumped northward and fell southward rapidly,maintained and swung the shear line of low and middle-level atmosphere over long periods,and configurated temperature-moisture energy.Through analysis we found that precipitation period and precipitation area had a good corresponding to radar product and satellite TBB image,the high potential pseudo-equivalent temperature(θse) of low level and high convergence available potential energy(CAPE) area as well as ascending area of strong convergence.With the extension of effective forecasted period,the forecast location of T639 and EC on the western ridge points of western Pacific subtropical high became more and more easterly and the intensity became weaker and weaker,which had some deviations for forecasting heavy precipitation area.
基金This research is funded by Fayoum University,Egypt.
文摘As big data,its technologies,and application continue to advance,the Smart Grid(SG)has become one of the most successful pervasive and fixed computing platforms that efficiently uses a data-driven approach and employs efficient information and communication technology(ICT)and cloud computing.As a result of the complicated architecture of cloud computing,the distinctive working of advanced metering infrastructures(AMI),and the use of sensitive data,it has become challenging tomake the SG secure.Faults of the SG are categorized into two main categories,Technical Losses(TLs)and Non-Technical Losses(NTLs).Hardware failure,communication issues,ohmic losses,and energy burnout during transmission and propagation of energy are TLs.NTL’s are human-induced errors for malicious purposes such as attacking sensitive data and electricity theft,along with tampering with AMI for bill reduction by fraudulent customers.This research proposes a data-driven methodology based on principles of computational intelligence as well as big data analysis to identify fraudulent customers based on their load profile.In our proposed methodology,a hybrid Genetic Algorithm and Support Vector Machine(GA-SVM)model has been used to extract the relevant subset of feature data from a large and unsupervised public smart grid project dataset in London,UK,for theft detection.A subset of 26 out of 71 features is obtained with a classification accuracy of 96.6%,compared to studies conducted on small and limited datasets.
文摘The rising popularity of online social networks (OSNs), such as Twitter, Facebook, MySpace, and LinkedIn, in recent years has sparked great interest in sentiment analysis on their data. While many methods exist for identifying sentiment in OSNs such as communication pattern mining and classification based on emoticon and parts of speech, the majority of them utilize a suboptimal batch mode learning approach when analyzing a large amount of real time data. As an alternative we present a stream algorithm using Modified Balanced Winnow for sentiment analysis on OSNs. Tested on three real-world network datasets, the performance of our sentiment predictions is close to that of batch learning with the ability to detect important features dynamically for sentiment analysis in data streams. These top features reveal key words important to the analysis of sentiment.
基金partly supported by the Public Geological Survey Project(No.201011039)the National High Technology Research and Development Project of China(No.2007AA06Z134)the 111 Project under the Ministry of Education and the State Administration of Foreign Experts Affairs,China(No.B07011)
文摘This paper presents a reasonable gridding-parameters extraction method for setting the optimal interpolation nodes in the gridding of scattered observed data. The method can extract optimized gridding parameters based on the distribution of features in raw data. Modeling analysis proves that distortion caused by gridding can be greatly reduced when using such parameters. We also present some improved technical measures that use human- machine interaction and multi-thread parallel technology to solve inadequacies in traditional gridding software. On the basis of these methods, we have developed software that can be used to grid scattered data using a graphic interface. Finally, a comparison of different gridding parameters on field magnetic data from Ji Lin Province, North China demonstrates the superiority of the proposed method in eliminating the distortions and enhancing gridding efficiency.
文摘Automatic road detection, in dense urban areas, is a challenging application in the remote sensing community. This is mainly because of physical and geometrical variations of road pixels, their spectral similarity to other features such as buildings, parking lots and sidewalks, and the obstruction by vehicles and trees. These problems are real obstacles in precise detection and identification of urban roads from high-resolution satellite imagery. One of the promising strategies to deal with this problem is using multi-sensors data to reduce the uncertainties of detection. In this paper, an integrated object-based analysis framework was developed for detecting and extracting various types of urban roads from high-resolution optical images and Lidar data. The proposed method is designed and implemented using a rule-oriented approach based on a masking strategy. The overall accuracy (OA) of the final road map was 89.2%, and the kappa coefficient of agreement was 0.83, which show the efficiency and performance of the method in different conditions and interclass noises. The results also demonstrate the high capability of this object-based method in simultaneous identification of a wide variety of road elements in complex urban areas using both high-resolution satellite images and Lidar data.