The Shiyang River is an important ecological pillar in northwest China,sustaining Minqin oasis and its surrounding society.However,the basin has long been plagued by water scarcity and ecological fragility.Although th...The Shiyang River is an important ecological pillar in northwest China,sustaining Minqin oasis and its surrounding society.However,the basin has long been plagued by water scarcity and ecological fragility.Although the river classification is critical for understanding the complexity,diversity,and ecological functions of rivers,and the foundation of river management and watershed ecological restoration,it has not received adequate attention in this region.To obtain a deeper and comprehensive understanding of the Shiyang River,this study utilizes the Rosgen stream classification system to assess the river morphology,geomorphic features,and hydrologic processes.The results showed that seven first-level and fourteen second-level river types can be identified along 53 river sections of the Shiyang River.Further comparison analysis on the hydrologic parameters for each river type demonstrated a strong positive correlation between discharge and all river parameters.As discharge increased,channels with moderate to high width/depth ratios experienced significant lateral adjustments.A consistent channel gradient,coupled with higher discharge,facilitated the transition from single to multiple channels.Braiding tendencies were more pronounced in rivers where riverbeds were wider and shallower with higher stream power.Additionally,water-flow shear stress decreased with the increase in the width/depth ratio.This study offered critical insights into the Shiyang River’s forms and processes and for the river management and ecological restoration practices.展开更多
This article proposes a VGG network with histogram of oriented gradient(HOG) feature fusion(HOG-VGG) for polarization synthetic aperture radar(PolSAR) image terrain classification.VGG-Net has a strong ability of deep ...This article proposes a VGG network with histogram of oriented gradient(HOG) feature fusion(HOG-VGG) for polarization synthetic aperture radar(PolSAR) image terrain classification.VGG-Net has a strong ability of deep feature extraction,which can fully extract the global deep features of different terrains in PolSAR images,so it is widely used in PolSAR terrain classification.However,VGG-Net ignores the local edge & shape features,resulting in incomplete feature representation of the PolSAR terrains,as a consequence,the terrain classification accuracy is not promising.In fact,edge and shape features play an important role in PolSAR terrain classification.To solve this problem,a new VGG network with HOG feature fusion was specifically proposed for high-precision PolSAR terrain classification.HOG-VGG extracts both the global deep semantic features and the local edge & shape features of the PolSAR terrains,so the terrain feature representation completeness is greatly elevated.Moreover,HOG-VGG optimally fuses the global deep features and the local edge & shape features to achieve the best classification results.The superiority of HOG-VGG is verified on the Flevoland,San Francisco and Oberpfaffenhofen datasets.Experiments show that the proposed HOG-VGG achieves much better PolSAR terrain classification performance,with overall accuracies of 97.54%,94.63%,and 96.07%,respectively.展开更多
The growing P2P streaming traffic brings a variety of problems and challenges to ISP networks and service providers.A P2P streaming traffic classification method based on sampling technology is presented in this paper...The growing P2P streaming traffic brings a variety of problems and challenges to ISP networks and service providers.A P2P streaming traffic classification method based on sampling technology is presented in this paper.By analyzing traffic statistical features and network behavior of P2P streaming,a group of flow characteristics were found,which can make P2P streaming more recognizable among other applications.Attributes from Netflow and those proposed by us are compared in terms of classification accuracy,and so are the results of different sampling rates.It is proved that the unified classification model with the proposed attributes can identify P2P streaming quickly and efficiently in the online system.Even with 1:50 sampling rate,the recognition accuracy can be higher than 94%.Moreover,we have evaluated the CPU resources,storage capacity and time consumption before and after the sampling,it is shown that the classification model after the sampling can significantly reduce the resource requirements with the same recognition accuracy.展开更多
Big data streams started becoming ubiquitous in recent years,thanks to rapid generation of massive volumes of data by different applications.It is challenging to apply existing data mining tools and techniques directl...Big data streams started becoming ubiquitous in recent years,thanks to rapid generation of massive volumes of data by different applications.It is challenging to apply existing data mining tools and techniques directly in these big data streams.At the same time,streaming data from several applications results in two major problems such as class imbalance and concept drift.The current research paper presents a new Multi-Objective Metaheuristic Optimization-based Big Data Analytics with Concept Drift Detection(MOMBD-CDD)method on High-Dimensional Streaming Data.The presented MOMBD-CDD model has different operational stages such as pre-processing,CDD,and classification.MOMBD-CDD model overcomes class imbalance problem by Synthetic Minority Over-sampling Technique(SMOTE).In order to determine the oversampling rates and neighboring point values of SMOTE,Glowworm Swarm Optimization(GSO)algorithm is employed.Besides,Statistical Test of Equal Proportions(STEPD),a CDD technique is also utilized.Finally,Bidirectional Long Short-Term Memory(Bi-LSTM)model is applied for classification.In order to improve classification performance and to compute the optimum parameters for Bi-LSTM model,GSO-based hyperparameter tuning process is carried out.The performance of the presented model was evaluated using high dimensional benchmark streaming datasets namely intrusion detection(NSL KDDCup)dataset and ECUE spam dataset.An extensive experimental validation process confirmed the effective outcome of MOMBD-CDD model.The proposed model attained high accuracy of 97.45%and 94.23%on the applied KDDCup99 Dataset and ECUE Spam datasets respectively.展开更多
The rapid developments in the fields of telecommunication, sensor data, financial applications, analyzing of data streams, and so on, increase the rate of data arrival, among which the data mining technique is conside...The rapid developments in the fields of telecommunication, sensor data, financial applications, analyzing of data streams, and so on, increase the rate of data arrival, among which the data mining technique is considered a vital process. The data analysis process consists of different tasks, among which the data stream classification approaches face more challenges than the other commonly used techniques. Even though the classification is a continuous process, it requires a design that can adapt the classification model so as to adjust the concept change or the boundary change between the classes. Hence, we design a novel fuzzy classifier known as THRFuzzy to classify new incoming data streams. Rough set theory along with tangential holoentropy function helps in the designing the dynamic classification model. The classification approach uses kernel fuzzy c-means(FCM) clustering for the generation of the rules and tangential holoentropy function to update the membership function. The performance of the proposed THRFuzzy method is verified using three datasets, namely skin segmentation, localization, and breast cancer datasets, and the evaluated metrics, accuracy and time, comparing its performance with HRFuzzy and adaptive k-NN classifiers. The experimental results conclude that THRFuzzy classifier shows better classification results providing a maximum accuracy consuming a minimal time than the existing classifiers.展开更多
Logistic regression is a fast classifier and can achieve higher accuracy on small training data.Moreover,it can work on both discrete and continuous attributes with nonlinear patterns.Based on these properties of logi...Logistic regression is a fast classifier and can achieve higher accuracy on small training data.Moreover,it can work on both discrete and continuous attributes with nonlinear patterns.Based on these properties of logistic regression,this paper proposed an algorithm,called evolutionary logistical regression classifier(ELRClass),to solve the classification of evolving data streams.This algorithm applies logistic regression repeatedly to a sliding window of samples in order to update the existing classifier,to keep this classifier if its performance is deteriorated by the reason of bursting noise,or to construct a new classifier if a major concept drift is detected.The intensive experimental results demonstrate the effectiveness of this algorithm.展开更多
Every application in a smart city environment like the smart grid,health monitoring, security, and surveillance generates non-stationary datastreams. Due to such nature, the statistical properties of data changes over...Every application in a smart city environment like the smart grid,health monitoring, security, and surveillance generates non-stationary datastreams. Due to such nature, the statistical properties of data changes overtime, leading to class imbalance and concept drift issues. Both these issuescause model performance degradation. Most of the current work has beenfocused on developing an ensemble strategy by training a new classifier on thelatest data to resolve the issue. These techniques suffer while training the newclassifier if the data is imbalanced. Also, the class imbalance ratio may changegreatly from one input stream to another, making the problem more complex.The existing solutions proposed for addressing the combined issue of classimbalance and concept drift are lacking in understating of correlation of oneproblem with the other. This work studies the association between conceptdrift and class imbalance ratio and then demonstrates how changes in classimbalance ratio along with concept drift affect the classifier’s performance.We analyzed the effect of both the issues on minority and majority classesindividually. To do this, we conducted experiments on benchmark datasetsusing state-of-the-art classifiers especially designed for data stream classification.Precision, recall, F1 score, and geometric mean were used to measure theperformance. Our findings show that when both class imbalance and conceptdrift problems occur together the performance can decrease up to 15%. Ourresults also show that the increase in the imbalance ratio can cause a 10% to15% decrease in the precision scores of both minority and majority classes.The study findings may help in designing intelligent and adaptive solutionsthat can cope with the challenges of non-stationary data streams like conceptdrift and class imbalance.展开更多
Nitrogen(N)and potassium(K)are two key mineral nutrient elements involved in rice growth.Accurate diagnosis of N and K status is very important for the rational application of fertilizers at a specific rice growth sta...Nitrogen(N)and potassium(K)are two key mineral nutrient elements involved in rice growth.Accurate diagnosis of N and K status is very important for the rational application of fertilizers at a specific rice growth stage.Therefore,we propose a hybrid model for diagnosing rice nutrient levels at the early panicle initiation stage(EPIS),which combines a convolutional neural network(CNN)with an attention mechanism and a long short-term memory network(LSTM).The model was validated on a large set of sequential images collected by an unmanned aerial vehicle(UAV)from rice canopies at different growth stages during a two-year experiment.Compared with VGG16,AlexNet,GoogleNet,DenseNet,and inceptionV3,ResNet101 combined with LSTM obtained the highest average accuracy of 83.81%on the dataset of Huanghuazhan(HHZ,an indica cultivar).When tested on the datasets of HHZ and Xiushui 134(XS134,a japonica rice variety)in 2021,the ResNet101-LSTM model enhanced with the squeeze-and-excitation(SE)block achieved the highest accuracies of 85.38 and 88.38%,respectively.Through the cross-dataset method,the average accuracies on the HHZ and XS134 datasets tested in 2022 were 81.25 and 82.50%,respectively,showing a good generalization.Our proposed model works with the dynamic information of different rice growth stages and can efficiently diagnose different rice nutrient status levels at EPIS,which are helpful for making practical decisions regarding rational fertilization treatments at the panicle initiation stage.展开更多
Residual based on a posteriori error estimates for conforming finite element solutions of incompressible Navier-Stokes equations with stream function form which were computed with seven recently proposed two-level met...Residual based on a posteriori error estimates for conforming finite element solutions of incompressible Navier-Stokes equations with stream function form which were computed with seven recently proposed two-level method were derived. The posteriori error estimates contained additional terms in comparison to the error estimates for the solution obtained by the standard finite element method. The importance of these additional terms in the error estimates was investigated by studying their asymptotic behavior. For optimal scaled meshes, these bounds are not of higher order than of convergence of discrete solution.展开更多
With the enhancement of data collection capabilities,massive streaming data have been accumulated in numerous application scenarios.Specifically,the issue of classifying data streams based on mobile sensors can be for...With the enhancement of data collection capabilities,massive streaming data have been accumulated in numerous application scenarios.Specifically,the issue of classifying data streams based on mobile sensors can be formalized as a multi-task multi-view learning problem with a specific task comprising multiple views with shared features collected from multiple sensors.Existing incremental learning methods are often single-task single-view,which cannot learn shared representations between relevant tasks and views.An adaptive multi-task multi-view incremental learning framework for data stream classification called MTMVIS is proposed to address the above challenges,utilizing the idea of multi-task multi-view learning.Specifically,the attention mechanism is first used to align different sensor data of different views.In addition,MTMVIS uses adaptive Fisher regularization from the perspective of multi-task multi-view learning to overcome catastrophic forgetting in incremental learning.Results reveal that the proposed framework outperforms state-of-the-art methods based on the experiments on two different datasets with other baselines.展开更多
Urbanization can affect the physical process of river growth, modify stream structure and further influence the functions of river system. Shanghai is one of the largest cities in the world, which is located in Changj...Urbanization can affect the physical process of river growth, modify stream structure and further influence the functions of river system. Shanghai is one of the largest cities in the world, which is located in Changjiang (Yangtze) River Delta in China. Since the 1970s, the whole river system in Shanghai has been planned and managed by the Shanghai Water Authority. The primary management objectives in the last 30 years have been to enhance irrigation and flood-control. By using Horton-Strahler classification and Horton laws as a reference, a novel method of stream classification, in conjunction with the traditional and specially designed indicators, was applied to understanding the structure and functions of the river system in Shanghai. Correlation analysis was used to identify the interrelations among indicators. It was found that the impact of urbanization on the river system was significant although natural laws and physical characteristics marked a super-developed river system. There was an obvious correlation between the degree of urbanization and the abnormal values of some indicators. Urbanization impacts on river system such as branches engineered out, riverbank concreting and low diversity of river style were widely observed. Each indicator had distinct sensibility to urbanization so they could be used to describe different characteristics of urban river system. The function indicators were significantly related to structure indicators. Stream structure, described by fractal dimension and complexity of river system, was as important as water area ratio for maintaining river’s multi-function.展开更多
According to groundwater level monitoring data of Shuping landslide in the Three Gorges Reservoir area, based on the response relationship between influential factors such as rainfall and reservoir level and the chang...According to groundwater level monitoring data of Shuping landslide in the Three Gorges Reservoir area, based on the response relationship between influential factors such as rainfall and reservoir level and the change of groundwater level, the influential factors of groundwater level were selected. Then the classification and regression tree(CART) model was constructed by the subset and used to predict the groundwater level. Through the verification, the predictive results of the test sample were consistent with the actually measured values, and the mean absolute error and relative error is 0.28 m and 1.15%respectively. To compare the support vector machine(SVM) model constructed using the same set of factors, the mean absolute error and relative error of predicted results is 1.53 m and 6.11% respectively. It is indicated that CART model has not only better fitting and generalization ability, but also strong advantages in the analysis of landslide groundwater dynamic characteristics and the screening of important variables. It is an effective method for prediction of ground water level in landslides.展开更多
With the development of satellite technology,the satellite imagery of the earth’s surface and the whole surface makes it possible to survey surface resources and master the dynamic changes of the earth with high effi...With the development of satellite technology,the satellite imagery of the earth’s surface and the whole surface makes it possible to survey surface resources and master the dynamic changes of the earth with high efficiency and low consumption.As an important tool for satellite remote sensing image processing,remote sensing image classification has become a hot topic.According to the natural texture characteristics of remote sensing images,this paper combines different texture features with the Extreme Learning Machine,and proposes a new remote sensing image classification algorithm.The experimental tests are carried out through the standard test dataset SAT-4 and SAT-6.Our results show that the proposed method is a simpler and more efficient remote sensing image classification algorithm.It also achieves 99.434%recognition accuracy on SAT-4,which is 1.5%higher than the 97.95%accuracy achieved by DeepSat.At the same time,the recognition accuracy of SAT-6 reaches 99.5728%,which is 5.6%higher than DeepSat’s 93.9%.展开更多
In this paper we propose a novel method for video quality prediction using video classification. In essence, our ap- proach can serve two goals: (1) To measure the video quality of compressed video sequences without r...In this paper we propose a novel method for video quality prediction using video classification. In essence, our ap- proach can serve two goals: (1) To measure the video quality of compressed video sequences without referencing to the original uncompressed videos, i.e., to realize No-Reference (NR) video quality evaluation; (2) To predict quality scores for uncompressed video sequences at various bitrates without actually encoding them. The use of our approach can help realize video streaming with ideal Quality of Service (QoS). Our approach is a low complexity solution, which is specially suitable for application to mobile video streaming where the resources at the handsets are scarce.展开更多
This paper proposes a security policy model for mandatory access control in class B1 database management system whose level of labeling is tuple. The relation hierarchical data model is extended to multilevel relatio...This paper proposes a security policy model for mandatory access control in class B1 database management system whose level of labeling is tuple. The relation hierarchical data model is extended to multilevel relation hierarchical data model. Based on the multilevel relation hierarchical data model, the concept of upper lower layer relational integrity is presented after we analyze and eliminate the covert channels caused by the database integrity. Two SQL statements are extended to process polyinstantiation in the multilevel secure environment. The system is based on the multilevel relation hierarchical data model and is capable of integratively storing and manipulating multilevel complicated objects ( e.g., multilevel spatial data) and multilevel conventional data ( e.g., integer, real number and character string).展开更多
With the rapid development in business transactions,especially in recent years,it has become necessary to develop different mechanisms to trace business user records in web server log in an efficient way.Online busine...With the rapid development in business transactions,especially in recent years,it has become necessary to develop different mechanisms to trace business user records in web server log in an efficient way.Online business transactions have increased,especially when the user or customer cannot obtain the required service.For example,with the spread of the epidemic Coronavirus(COVID-19)throughout the world,there is a dire need to rely more on online business processes.In order to improve the efficiency and performance of E-business structure,a web server log must be well utilized to have the ability to trace and record infinite user transactions.This paper proposes an event stream mechanism based on formula patterns to enhance business processes and record all user activities in a structured log file.Each user activity is recorded with a set of tracing parameters that can predict the behavior of the user in business operations.The experimental results are conducted by applying clustering-based classification algorithms on two different datasets;namely,Online Shoppers Purchasing Intention and Instacart Market Basket Analysis.The clustering process is used to group related objects into the same cluster,then the classification process measures the predicted classes of clustered objects.The experimental results record provable accuracy in predicting user preferences on both datasets.展开更多
Textual data streams have been extensively used in practical applications where consumers of online products have expressed their views regarding online products.Due to changes in data distribution,commonly referred t...Textual data streams have been extensively used in practical applications where consumers of online products have expressed their views regarding online products.Due to changes in data distribution,commonly referred to as concept drift,mining this data stream is a challenging problem for researchers.The majority of the existing drift detection techniques are based on classification errors,which have higher probabilities of false-positive or missed detections.To improve classification accuracy,there is a need to develop more intuitive detection techniques that can identify a great number of drifts in the data streams.This paper presents an adaptive unsupervised learning technique,an ensemble classifier based on drift detection for opinion mining and sentiment classification.To improve classification performance,this approach uses four different dissimilarity measures to determine the degree of concept drifts in the data stream.Whenever a drift is detected,the proposed method builds and adds a new classifier to the ensemble.To add a new classifier,the total number of classifiers in the ensemble is first checked if the limit is exceeded before the classifier with the least weight is removed from the ensemble.To this end,a weighting mechanism is used to calculate the weight of each classifier,which decides the contribution of each classifier in the final classification results.Several experiments were conducted on real-world datasets and the resultswere evaluated on the false positive rate,miss detection rate,and accuracy measures.The proposed method is also compared with the state-of-the-art methods,which include DDM,EDDM,and PageHinkley with support vector machine(SVM)and Naive Bayes classifiers that are frequently used in concept drift detection studies.In all cases,the results show the efficiency of our proposed method.展开更多
The Very Fast Decision Tree(VFDT)algorithm is a classification algorithm for data streams.When processing large amounts of data,VFDT requires less time than traditional decision tree algorithms.However,when training s...The Very Fast Decision Tree(VFDT)algorithm is a classification algorithm for data streams.When processing large amounts of data,VFDT requires less time than traditional decision tree algorithms.However,when training samples become fewer,the label values of VFDT leaf nodes will have more errors,and the classification ability of single VFDT decision tree is limited.The Random Forest algorithm is a combinational classifier with high prediction accuracy and noise-tol-erant ability.It is constituted by multiple decision trees and can make up for the shortage of single decision tree.In this paper,in order to improve the classification accuracy on data streams,the Random Forest algorithm is integrated into the process of tree building of the VFDT algorithm,and a new Random Forest Based Very Fast Decision Tree algorithm named RFVFDT is designed.The RFVFDT algorithm adopts the decision tree building criterion of a Random Forest classifier,and improves Random Forest algorithm with sliding window to meet the unboundedness of data streams and avoid process delay and data loss.Experimental results of the classification of KDD CUP data sets show that the classification accuracy of RFVFDT algorithm is higher than that of VFDT.The less the samples are,the more obvious the advantage is.RFVFDT is fast when running in the multithread mode.展开更多
Handling sentiment drifts in real time twitter data streams are a challen-ging task while performing sentiment classifications,because of the changes that occur in the sentiments of twitter users,with respect to time....Handling sentiment drifts in real time twitter data streams are a challen-ging task while performing sentiment classifications,because of the changes that occur in the sentiments of twitter users,with respect to time.The growing volume of tweets with sentiment drifts has led to the need for devising an adaptive approach to detect and handle this drift in real time.This work proposes an adap-tive learning algorithm-based framework,Twitter Sentiment Drift Analysis-Bidir-ectional Encoder Representations from Transformers(TSDA-BERT),which introduces a sentiment drift measure to detect drifts and a domain impact score to adaptively retrain the classification model with domain relevant data in real time.The framework also works on static data by converting them to data streams using the Kafka tool.The experiments conducted on real time and simulated tweets of sports,health care andfinancial topics show that the proposed system is able to detect sentiment drifts and maintain the performance of the classification model,with accuracies of 91%,87%and 90%,respectively.Though the results have been provided only for a few topics,as a proof of concept,this framework can be applied to detect sentiment drifts and perform sentiment classification on real time data streams of any topic.展开更多
基金funded by The Second Tibetan Plateau Scientific Expedition and Research Program(STEP)(Grant No.2019QZKK0205)the National Natural Science Foundation of China(Grant No.42171002)the Science and technology Project of Tibet Autonomous Region(Grant No.XZ202401ZY0069).
文摘The Shiyang River is an important ecological pillar in northwest China,sustaining Minqin oasis and its surrounding society.However,the basin has long been plagued by water scarcity and ecological fragility.Although the river classification is critical for understanding the complexity,diversity,and ecological functions of rivers,and the foundation of river management and watershed ecological restoration,it has not received adequate attention in this region.To obtain a deeper and comprehensive understanding of the Shiyang River,this study utilizes the Rosgen stream classification system to assess the river morphology,geomorphic features,and hydrologic processes.The results showed that seven first-level and fourteen second-level river types can be identified along 53 river sections of the Shiyang River.Further comparison analysis on the hydrologic parameters for each river type demonstrated a strong positive correlation between discharge and all river parameters.As discharge increased,channels with moderate to high width/depth ratios experienced significant lateral adjustments.A consistent channel gradient,coupled with higher discharge,facilitated the transition from single to multiple channels.Braiding tendencies were more pronounced in rivers where riverbeds were wider and shallower with higher stream power.Additionally,water-flow shear stress decreased with the increase in the width/depth ratio.This study offered critical insights into the Shiyang River’s forms and processes and for the river management and ecological restoration practices.
基金Sponsored by the Fundamental Research Funds for the Central Universities of China(Grant No.PA2023IISL0098)the Hefei Municipal Natural Science Foundation(Grant No.202201)+1 种基金the National Natural Science Foundation of China(Grant No.62071164)the Open Fund of Information Materials and Intelligent Sensing Laboratory of Anhui Province(Anhui University)(Grant No.IMIS202214 and IMIS202102)。
文摘This article proposes a VGG network with histogram of oriented gradient(HOG) feature fusion(HOG-VGG) for polarization synthetic aperture radar(PolSAR) image terrain classification.VGG-Net has a strong ability of deep feature extraction,which can fully extract the global deep features of different terrains in PolSAR images,so it is widely used in PolSAR terrain classification.However,VGG-Net ignores the local edge & shape features,resulting in incomplete feature representation of the PolSAR terrains,as a consequence,the terrain classification accuracy is not promising.In fact,edge and shape features play an important role in PolSAR terrain classification.To solve this problem,a new VGG network with HOG feature fusion was specifically proposed for high-precision PolSAR terrain classification.HOG-VGG extracts both the global deep semantic features and the local edge & shape features of the PolSAR terrains,so the terrain feature representation completeness is greatly elevated.Moreover,HOG-VGG optimally fuses the global deep features and the local edge & shape features to achieve the best classification results.The superiority of HOG-VGG is verified on the Flevoland,San Francisco and Oberpfaffenhofen datasets.Experiments show that the proposed HOG-VGG achieves much better PolSAR terrain classification performance,with overall accuracies of 97.54%,94.63%,and 96.07%,respectively.
基金supported by State Key Program of National Natural Science Foundation of China under Grant No.61072061111 Project of China under Grant No.B08004the Fundamental Research Funds for the Central Universities under Grant No.2009RC0122
文摘The growing P2P streaming traffic brings a variety of problems and challenges to ISP networks and service providers.A P2P streaming traffic classification method based on sampling technology is presented in this paper.By analyzing traffic statistical features and network behavior of P2P streaming,a group of flow characteristics were found,which can make P2P streaming more recognizable among other applications.Attributes from Netflow and those proposed by us are compared in terms of classification accuracy,and so are the results of different sampling rates.It is proved that the unified classification model with the proposed attributes can identify P2P streaming quickly and efficiently in the online system.Even with 1:50 sampling rate,the recognition accuracy can be higher than 94%.Moreover,we have evaluated the CPU resources,storage capacity and time consumption before and after the sampling,it is shown that the classification model after the sampling can significantly reduce the resource requirements with the same recognition accuracy.
文摘Big data streams started becoming ubiquitous in recent years,thanks to rapid generation of massive volumes of data by different applications.It is challenging to apply existing data mining tools and techniques directly in these big data streams.At the same time,streaming data from several applications results in two major problems such as class imbalance and concept drift.The current research paper presents a new Multi-Objective Metaheuristic Optimization-based Big Data Analytics with Concept Drift Detection(MOMBD-CDD)method on High-Dimensional Streaming Data.The presented MOMBD-CDD model has different operational stages such as pre-processing,CDD,and classification.MOMBD-CDD model overcomes class imbalance problem by Synthetic Minority Over-sampling Technique(SMOTE).In order to determine the oversampling rates and neighboring point values of SMOTE,Glowworm Swarm Optimization(GSO)algorithm is employed.Besides,Statistical Test of Equal Proportions(STEPD),a CDD technique is also utilized.Finally,Bidirectional Long Short-Term Memory(Bi-LSTM)model is applied for classification.In order to improve classification performance and to compute the optimum parameters for Bi-LSTM model,GSO-based hyperparameter tuning process is carried out.The performance of the presented model was evaluated using high dimensional benchmark streaming datasets namely intrusion detection(NSL KDDCup)dataset and ECUE spam dataset.An extensive experimental validation process confirmed the effective outcome of MOMBD-CDD model.The proposed model attained high accuracy of 97.45%and 94.23%on the applied KDDCup99 Dataset and ECUE Spam datasets respectively.
基金supported by proposal No.OSD/BCUD/392/197 Board of Colleges and University Development,Savitribai Phule Pune University,Pune
文摘The rapid developments in the fields of telecommunication, sensor data, financial applications, analyzing of data streams, and so on, increase the rate of data arrival, among which the data mining technique is considered a vital process. The data analysis process consists of different tasks, among which the data stream classification approaches face more challenges than the other commonly used techniques. Even though the classification is a continuous process, it requires a design that can adapt the classification model so as to adjust the concept change or the boundary change between the classes. Hence, we design a novel fuzzy classifier known as THRFuzzy to classify new incoming data streams. Rough set theory along with tangential holoentropy function helps in the designing the dynamic classification model. The classification approach uses kernel fuzzy c-means(FCM) clustering for the generation of the rules and tangential holoentropy function to update the membership function. The performance of the proposed THRFuzzy method is verified using three datasets, namely skin segmentation, localization, and breast cancer datasets, and the evaluated metrics, accuracy and time, comparing its performance with HRFuzzy and adaptive k-NN classifiers. The experimental results conclude that THRFuzzy classifier shows better classification results providing a maximum accuracy consuming a minimal time than the existing classifiers.
文摘Logistic regression is a fast classifier and can achieve higher accuracy on small training data.Moreover,it can work on both discrete and continuous attributes with nonlinear patterns.Based on these properties of logistic regression,this paper proposed an algorithm,called evolutionary logistical regression classifier(ELRClass),to solve the classification of evolving data streams.This algorithm applies logistic regression repeatedly to a sliding window of samples in order to update the existing classifier,to keep this classifier if its performance is deteriorated by the reason of bursting noise,or to construct a new classifier if a major concept drift is detected.The intensive experimental results demonstrate the effectiveness of this algorithm.
基金The authors would like to extend their gratitude to Universiti Teknologi PETRONAS (Malaysia)for funding this research through grant number (015LA0-037).
文摘Every application in a smart city environment like the smart grid,health monitoring, security, and surveillance generates non-stationary datastreams. Due to such nature, the statistical properties of data changes overtime, leading to class imbalance and concept drift issues. Both these issuescause model performance degradation. Most of the current work has beenfocused on developing an ensemble strategy by training a new classifier on thelatest data to resolve the issue. These techniques suffer while training the newclassifier if the data is imbalanced. Also, the class imbalance ratio may changegreatly from one input stream to another, making the problem more complex.The existing solutions proposed for addressing the combined issue of classimbalance and concept drift are lacking in understating of correlation of oneproblem with the other. This work studies the association between conceptdrift and class imbalance ratio and then demonstrates how changes in classimbalance ratio along with concept drift affect the classifier’s performance.We analyzed the effect of both the issues on minority and majority classesindividually. To do this, we conducted experiments on benchmark datasetsusing state-of-the-art classifiers especially designed for data stream classification.Precision, recall, F1 score, and geometric mean were used to measure theperformance. Our findings show that when both class imbalance and conceptdrift problems occur together the performance can decrease up to 15%. Ourresults also show that the increase in the imbalance ratio can cause a 10% to15% decrease in the precision scores of both minority and majority classes.The study findings may help in designing intelligent and adaptive solutionsthat can cope with the challenges of non-stationary data streams like conceptdrift and class imbalance.
基金supported by the National Key Research and Development Program of China(2022YFD2300700)the Open Project Program of State Key Laboratory of Rice Biology,China National Rice Research Institute(20210403)the Zhejiang“Ten Thousand Talents”Plan Science and Technology Innovation Leading Talent Project,China(2020R52035)。
文摘Nitrogen(N)and potassium(K)are two key mineral nutrient elements involved in rice growth.Accurate diagnosis of N and K status is very important for the rational application of fertilizers at a specific rice growth stage.Therefore,we propose a hybrid model for diagnosing rice nutrient levels at the early panicle initiation stage(EPIS),which combines a convolutional neural network(CNN)with an attention mechanism and a long short-term memory network(LSTM).The model was validated on a large set of sequential images collected by an unmanned aerial vehicle(UAV)from rice canopies at different growth stages during a two-year experiment.Compared with VGG16,AlexNet,GoogleNet,DenseNet,and inceptionV3,ResNet101 combined with LSTM obtained the highest average accuracy of 83.81%on the dataset of Huanghuazhan(HHZ,an indica cultivar).When tested on the datasets of HHZ and Xiushui 134(XS134,a japonica rice variety)in 2021,the ResNet101-LSTM model enhanced with the squeeze-and-excitation(SE)block achieved the highest accuracies of 85.38 and 88.38%,respectively.Through the cross-dataset method,the average accuracies on the HHZ and XS134 datasets tested in 2022 were 81.25 and 82.50%,respectively,showing a good generalization.Our proposed model works with the dynamic information of different rice growth stages and can efficiently diagnose different rice nutrient status levels at EPIS,which are helpful for making practical decisions regarding rational fertilization treatments at the panicle initiation stage.
文摘Residual based on a posteriori error estimates for conforming finite element solutions of incompressible Navier-Stokes equations with stream function form which were computed with seven recently proposed two-level method were derived. The posteriori error estimates contained additional terms in comparison to the error estimates for the solution obtained by the standard finite element method. The importance of these additional terms in the error estimates was investigated by studying their asymptotic behavior. For optimal scaled meshes, these bounds are not of higher order than of convergence of discrete solution.
文摘With the enhancement of data collection capabilities,massive streaming data have been accumulated in numerous application scenarios.Specifically,the issue of classifying data streams based on mobile sensors can be formalized as a multi-task multi-view learning problem with a specific task comprising multiple views with shared features collected from multiple sensors.Existing incremental learning methods are often single-task single-view,which cannot learn shared representations between relevant tasks and views.An adaptive multi-task multi-view incremental learning framework for data stream classification called MTMVIS is proposed to address the above challenges,utilizing the idea of multi-task multi-view learning.Specifically,the attention mechanism is first used to align different sensor data of different views.In addition,MTMVIS uses adaptive Fisher regularization from the perspective of multi-task multi-view learning to overcome catastrophic forgetting in incremental learning.Results reveal that the proposed framework outperforms state-of-the-art methods based on the experiments on two different datasets with other baselines.
基金Under the auspices of the National Natural Science Foundation of China (No. 40471019) and Shanghai Shu GuangScholar Scheme (No. 03SG22)
文摘Urbanization can affect the physical process of river growth, modify stream structure and further influence the functions of river system. Shanghai is one of the largest cities in the world, which is located in Changjiang (Yangtze) River Delta in China. Since the 1970s, the whole river system in Shanghai has been planned and managed by the Shanghai Water Authority. The primary management objectives in the last 30 years have been to enhance irrigation and flood-control. By using Horton-Strahler classification and Horton laws as a reference, a novel method of stream classification, in conjunction with the traditional and specially designed indicators, was applied to understanding the structure and functions of the river system in Shanghai. Correlation analysis was used to identify the interrelations among indicators. It was found that the impact of urbanization on the river system was significant although natural laws and physical characteristics marked a super-developed river system. There was an obvious correlation between the degree of urbanization and the abnormal values of some indicators. Urbanization impacts on river system such as branches engineered out, riverbank concreting and low diversity of river style were widely observed. Each indicator had distinct sensibility to urbanization so they could be used to describe different characteristics of urban river system. The function indicators were significantly related to structure indicators. Stream structure, described by fractal dimension and complexity of river system, was as important as water area ratio for maintaining river’s multi-function.
基金supported by the China Earthquake Administration, Institute of Seismology Foundation (IS201526246)
文摘According to groundwater level monitoring data of Shuping landslide in the Three Gorges Reservoir area, based on the response relationship between influential factors such as rainfall and reservoir level and the change of groundwater level, the influential factors of groundwater level were selected. Then the classification and regression tree(CART) model was constructed by the subset and used to predict the groundwater level. Through the verification, the predictive results of the test sample were consistent with the actually measured values, and the mean absolute error and relative error is 0.28 m and 1.15%respectively. To compare the support vector machine(SVM) model constructed using the same set of factors, the mean absolute error and relative error of predicted results is 1.53 m and 6.11% respectively. It is indicated that CART model has not only better fitting and generalization ability, but also strong advantages in the analysis of landslide groundwater dynamic characteristics and the screening of important variables. It is an effective method for prediction of ground water level in landslides.
基金This work was supported in part by national science foundation project of P.R.China under Grant No.61701554State Language Commission Key Project(ZDl135-39)+1 种基金First class courses(Digital Image Processing:KC2066)MUC 111 Project,Ministry of Education Collaborative Education Project(201901056009,201901160059,201901238038).
文摘With the development of satellite technology,the satellite imagery of the earth’s surface and the whole surface makes it possible to survey surface resources and master the dynamic changes of the earth with high efficiency and low consumption.As an important tool for satellite remote sensing image processing,remote sensing image classification has become a hot topic.According to the natural texture characteristics of remote sensing images,this paper combines different texture features with the Extreme Learning Machine,and proposes a new remote sensing image classification algorithm.The experimental tests are carried out through the standard test dataset SAT-4 and SAT-6.Our results show that the proposed method is a simpler and more efficient remote sensing image classification algorithm.It also achieves 99.434%recognition accuracy on SAT-4,which is 1.5%higher than the 97.95%accuracy achieved by DeepSat.At the same time,the recognition accuracy of SAT-6 reaches 99.5728%,which is 5.6%higher than DeepSat’s 93.9%.
文摘In this paper we propose a novel method for video quality prediction using video classification. In essence, our ap- proach can serve two goals: (1) To measure the video quality of compressed video sequences without referencing to the original uncompressed videos, i.e., to realize No-Reference (NR) video quality evaluation; (2) To predict quality scores for uncompressed video sequences at various bitrates without actually encoding them. The use of our approach can help realize video streaming with ideal Quality of Service (QoS). Our approach is a low complexity solution, which is specially suitable for application to mobile video streaming where the resources at the handsets are scarce.
文摘This paper proposes a security policy model for mandatory access control in class B1 database management system whose level of labeling is tuple. The relation hierarchical data model is extended to multilevel relation hierarchical data model. Based on the multilevel relation hierarchical data model, the concept of upper lower layer relational integrity is presented after we analyze and eliminate the covert channels caused by the database integrity. Two SQL statements are extended to process polyinstantiation in the multilevel secure environment. The system is based on the multilevel relation hierarchical data model and is capable of integratively storing and manipulating multilevel complicated objects ( e.g., multilevel spatial data) and multilevel conventional data ( e.g., integer, real number and character string).
文摘With the rapid development in business transactions,especially in recent years,it has become necessary to develop different mechanisms to trace business user records in web server log in an efficient way.Online business transactions have increased,especially when the user or customer cannot obtain the required service.For example,with the spread of the epidemic Coronavirus(COVID-19)throughout the world,there is a dire need to rely more on online business processes.In order to improve the efficiency and performance of E-business structure,a web server log must be well utilized to have the ability to trace and record infinite user transactions.This paper proposes an event stream mechanism based on formula patterns to enhance business processes and record all user activities in a structured log file.Each user activity is recorded with a set of tracing parameters that can predict the behavior of the user in business operations.The experimental results are conducted by applying clustering-based classification algorithms on two different datasets;namely,Online Shoppers Purchasing Intention and Instacart Market Basket Analysis.The clustering process is used to group related objects into the same cluster,then the classification process measures the predicted classes of clustered objects.The experimental results record provable accuracy in predicting user preferences on both datasets.
基金The authors extend their appreciation to the Deanship of Scientific Research at King Khalid University for funding this work through Large Groups(Project under Grant Number(RGP.2/49/43)).
文摘Textual data streams have been extensively used in practical applications where consumers of online products have expressed their views regarding online products.Due to changes in data distribution,commonly referred to as concept drift,mining this data stream is a challenging problem for researchers.The majority of the existing drift detection techniques are based on classification errors,which have higher probabilities of false-positive or missed detections.To improve classification accuracy,there is a need to develop more intuitive detection techniques that can identify a great number of drifts in the data streams.This paper presents an adaptive unsupervised learning technique,an ensemble classifier based on drift detection for opinion mining and sentiment classification.To improve classification performance,this approach uses four different dissimilarity measures to determine the degree of concept drifts in the data stream.Whenever a drift is detected,the proposed method builds and adds a new classifier to the ensemble.To add a new classifier,the total number of classifiers in the ensemble is first checked if the limit is exceeded before the classifier with the least weight is removed from the ensemble.To this end,a weighting mechanism is used to calculate the weight of each classifier,which decides the contribution of each classifier in the final classification results.Several experiments were conducted on real-world datasets and the resultswere evaluated on the false positive rate,miss detection rate,and accuracy measures.The proposed method is also compared with the state-of-the-art methods,which include DDM,EDDM,and PageHinkley with support vector machine(SVM)and Naive Bayes classifiers that are frequently used in concept drift detection studies.In all cases,the results show the efficiency of our proposed method.
文摘The Very Fast Decision Tree(VFDT)algorithm is a classification algorithm for data streams.When processing large amounts of data,VFDT requires less time than traditional decision tree algorithms.However,when training samples become fewer,the label values of VFDT leaf nodes will have more errors,and the classification ability of single VFDT decision tree is limited.The Random Forest algorithm is a combinational classifier with high prediction accuracy and noise-tol-erant ability.It is constituted by multiple decision trees and can make up for the shortage of single decision tree.In this paper,in order to improve the classification accuracy on data streams,the Random Forest algorithm is integrated into the process of tree building of the VFDT algorithm,and a new Random Forest Based Very Fast Decision Tree algorithm named RFVFDT is designed.The RFVFDT algorithm adopts the decision tree building criterion of a Random Forest classifier,and improves Random Forest algorithm with sliding window to meet the unboundedness of data streams and avoid process delay and data loss.Experimental results of the classification of KDD CUP data sets show that the classification accuracy of RFVFDT algorithm is higher than that of VFDT.The less the samples are,the more obvious the advantage is.RFVFDT is fast when running in the multithread mode.
文摘Handling sentiment drifts in real time twitter data streams are a challen-ging task while performing sentiment classifications,because of the changes that occur in the sentiments of twitter users,with respect to time.The growing volume of tweets with sentiment drifts has led to the need for devising an adaptive approach to detect and handle this drift in real time.This work proposes an adap-tive learning algorithm-based framework,Twitter Sentiment Drift Analysis-Bidir-ectional Encoder Representations from Transformers(TSDA-BERT),which introduces a sentiment drift measure to detect drifts and a domain impact score to adaptively retrain the classification model with domain relevant data in real time.The framework also works on static data by converting them to data streams using the Kafka tool.The experiments conducted on real time and simulated tweets of sports,health care andfinancial topics show that the proposed system is able to detect sentiment drifts and maintain the performance of the classification model,with accuracies of 91%,87%and 90%,respectively.Though the results have been provided only for a few topics,as a proof of concept,this framework can be applied to detect sentiment drifts and perform sentiment classification on real time data streams of any topic.