This study presents a framework for predicting geological characteristics based on integrating a stacking classification algorithm(SCA) with a grid search(GS) and K-fold cross validation(K-CV). The SCA includes two le...This study presents a framework for predicting geological characteristics based on integrating a stacking classification algorithm(SCA) with a grid search(GS) and K-fold cross validation(K-CV). The SCA includes two learner layers: a primary learner’s layer and meta-classifier layer. The accuracy of the SCA can be improved by using the GS and K-CV. The GS was developed to match the hyper-parameters and optimise complicated problems. The K-CV is commonly applied to changing the validation set in a training set. In general, a GS is usually combined with K-CV to produce a corresponding evaluation index and select the best hyper-parameters. The torque penetration index(TPI) and field penetration index(FPI) are proposed based on shield parameters to express the geological characteristics. The elbow method(EM) and silhouette coefficient(Si) are employed to determine the types of geological characteristics(K) in a Kmeans++ algorithm. A case study on mixed ground in Guangzhou is adopted to validate the applicability of the developed model. The results show that with the developed framework, the four selected parameters, i.e. thrust, advance rate, cutterhead rotation speed and cutterhead torque, can be used to effectively predict the corresponding geological characteristics.展开更多
Classification algorithm is one of the key techniques to affect text automatic classification system’s performance, play an important role in automatic classification research area. This paper comparatively analyzed ...Classification algorithm is one of the key techniques to affect text automatic classification system’s performance, play an important role in automatic classification research area. This paper comparatively analyzed k-NN. VSM and hybrid classification algorithm presented by our research group. Some 2000 pieces of Internet news provided by ChinaInfoBank are used in the experiment. The result shows that the hybrid algorithm’s performance presented by the groups is superior to the other two algorithms.展开更多
In this paper, sixty-eight research articles published between 2000 and 2017 as well as textbooks which employed four classification algorithms: K-Nearest-Neighbor (KNN), Support Vector Machines (SVM), Random Forest (...In this paper, sixty-eight research articles published between 2000 and 2017 as well as textbooks which employed four classification algorithms: K-Nearest-Neighbor (KNN), Support Vector Machines (SVM), Random Forest (RF) and Neural Network (NN) as the main statistical tools were reviewed. The aim was to examine and compare these nonparametric classification methods on the following attributes: robustness to training data, sensitivity to changes, data fitting, stability, ability to handle large data sizes, sensitivity to noise, time invested in parameter tuning, and accuracy. The performances, strengths and shortcomings of each of the algorithms were examined, and finally, a conclusion was arrived at on which one has higher performance. It was evident from the literature reviewed that RF is too sensitive to small changes in the training dataset and is occasionally unstable and tends to overfit in the model. KNN is easy to implement and understand but has a major drawback of becoming significantly slow as the size of the data in use grows, while the ideal value of K for the KNN classifier is difficult to set. SVM and RF are insensitive to noise or overtraining, which shows their ability in dealing with unbalanced data. Larger input datasets will lengthen classification times for NN and KNN more than for SVM and RF. Among these nonparametric classification methods, NN has the potential to become a more widely used classification algorithm, but because of their time-consuming parameter tuning procedure, high level of complexity in computational processing, the numerous types of NN architectures to choose from and the high number of algorithms used for training, most researchers recommend SVM and RF as easier and wieldy used methods which repeatedly achieve results with high accuracies and are often faster to implement.展开更多
Many business applications rely on their historical data to predict their business future. The marketing products process is one of the core processes for the business. Customer needs give a useful piece of informatio...Many business applications rely on their historical data to predict their business future. The marketing products process is one of the core processes for the business. Customer needs give a useful piece of information that help</span><span style="font-family:Verdana;"><span style="font-family:Verdana;">s</span></span><span style="font-family:Verdana;"> to market the appropriate products at the appropriate time. Moreover, services are considered recently as products. The development of education and health services </span><span style="font-family:Verdana;"><span style="font-family:Verdana;">is</span></span><span style="font-family:Verdana;"> depending on historical data. For the more, reducing online social media networks problems and crimes need a significant source of information. Data analysts need to use an efficient classification algorithm to predict the future of such businesses. However, dealing with a huge quantity of data requires great time to process. Data mining involves many useful techniques that are used to predict statistical data in a variety of business applications. The classification technique is one of the most widely used with a variety of algorithms. In this paper, various classification algorithms are revised in terms of accuracy in different areas of data mining applications. A comprehensive analysis is made after delegated reading of 20 papers in the literature. This paper aims to help data analysts to choose the most suitable classification algorithm for different business applications including business in general, online social media networks, agriculture, health, and education. Results show FFBPN is the most accurate algorithm in the business domain. The Random Forest algorithm is the most accurate in classifying online social networks (OSN) activities. Na<span style="white-space:nowrap;">ï</span>ve Bayes algorithm is the most accurate to classify agriculture datasets. OneR is the most accurate algorithm to classify instances within the health domain. The C4.5 Decision Tree algorithm is the most accurate to classify students’ records to predict degree completion time.展开更多
A novel classification algorithm based on abnormal magnetic signals is proposed for ground moving targets which are made of ferromagnetic material. According to the effect of diverse targets on earth's magnetism,t...A novel classification algorithm based on abnormal magnetic signals is proposed for ground moving targets which are made of ferromagnetic material. According to the effect of diverse targets on earth's magnetism,the moving targets are detected by a magnetic sensor and classified with a simple computation method. The detection sensor is used for collecting a disturbance signal of earth magnetic field from an undetermined target. An optimum category match pattern of target signature is tested by training some statistical samples and designing a classification machine. Three ordinary targets are researched in the paper. The experimental results show that the algorithm has a low computation cost and a better sorting accuracy. This classification method can be applied to ground reconnaissance and target intrusion detection.展开更多
A new classification algorithm for web mining is proposed on the basis of general classification algorithm for data mining in order to implement personalized information services. The building tree method of detecting...A new classification algorithm for web mining is proposed on the basis of general classification algorithm for data mining in order to implement personalized information services. The building tree method of detecting class threshold is used for construction of decision tree according to the concept of user expectation so as to find classification rules in different layers. Compared with the traditional C4.5 algorithm, the disadvantage of excessive adaptation in C4.5 has been improved so that classification results not only have much higher accuracy but also statistic meaning.展开更多
Packet classification (PC) has become the main method to support the quality of service and security of network application. And two-dimeusioual prefix packet classification (PPC) is the popular one. This paper analyz...Packet classification (PC) has become the main method to support the quality of service and security of network application. And two-dimeusioual prefix packet classification (PPC) is the popular one. This paper analyzes the problem of ruler conflict, and then presents a TCAM-based two-dimensional PPC algorithm. This algorithm makes use of the parallelism of TCAM to lookup the longest prefix in one instruction cycle. Then it uses a memory image and associated data structures to eliminate the conflicts between rulers, and performs a fast two-dimeusional PPC. Compared with other algorithms, this algorithm has the least time complexity and less space complexity.展开更多
A critical problem associated with the southern part of Nigeria is the rapid alteration of the landscape as a result of logging, agricultural practices, human migration and expansion, oil exploration, exploitation and...A critical problem associated with the southern part of Nigeria is the rapid alteration of the landscape as a result of logging, agricultural practices, human migration and expansion, oil exploration, exploitation and production activities. These processes have had both positive and negative effects on the economic and socio-political development of the country in general. The negative impacts have led not only to the degradation of the ecosystem but also posing hazards to human health and polluting surface and ground water resources. This has created the need for the development of a rapid, cost effective and efficient land use/land cover (LULC) classification technique to monitor the biophysical dynamics in the region. Due to the complex land cover patterns existing in the study area and the occasionally indistinguishable relationship between land cover and spectral signals, this paper introduces a combined use of unsupervised and supervised image classification for detecting land use/land cover (LULC) classes. With the continuous conflict over the impact of oil activities in the area, this work provides a procedure for detecting LULC change, which is an important factor to consider in the design of an environmental decision-making framework. Results from the use of this technique on Landsat TM and ETM+ of 1987 and 2002 are discussed. The results reveal the pros and cons of the two methods and the effects of their overall accuracy on post-classification change detection.展开更多
In this research article, we analyze the multimedia data mining and classification algorithm based on database optimization techniques. Of high performance application requirements of various kinds are springing up co...In this research article, we analyze the multimedia data mining and classification algorithm based on database optimization techniques. Of high performance application requirements of various kinds are springing up constantly makes parallel computer system structure is valued by more and more common but the corresponding software system development lags far behind the development of the hardware system, it is more obvious in the field of database technology application. Multimedia mining is different from the low level of computer multimedia processing technology and the former focuses on the extracted from huge multimedia collection mode which focused on specific features of understanding or extraction from a single multimedia objects. Our research provides new paradigm for the methodology which will be meaningful and necessary.展开更多
Mapping croplands,including fallow areas,are an important measure to determine the quantity of food that is produced,where they are produced,and when they are produced(e.g.seasonality).Furthermore,croplands are known ...Mapping croplands,including fallow areas,are an important measure to determine the quantity of food that is produced,where they are produced,and when they are produced(e.g.seasonality).Furthermore,croplands are known as water guzzlers by consuming anywhere between 70%and 90%of all human water use globally.Given these facts and the increase in global population to nearly 10 billion by the year 2050,the need for routine,rapid,and automated cropland mapping year-after-year and/or season-after-season is of great importance.The overarching goal of this study was to generate standard and routine cropland products,year-after-year,over very large areas through the use of two novel methods:(a)quantitative spectral matching techniques(QSMTs)applied at continental level and(b)rule-based Automated Cropland Classification Algorithm(ACCA)with the ability to hind-cast,now-cast,and future-cast.Australia was chosen for the study given its extensive croplands,rich history of agriculture,and yet nonexistent routine yearly generated cropland products using multi-temporal remote sensing.This research produced three distinct cropland products using Moderate Resolution Imaging Spectroradiometer(MODIS)250-m normalized difference vegetation index 16-day composite time-series data for 16 years:2000 through 2015.The products consisted of:(1)cropland extent/areas versus cropland fallow areas,(2)irrigated versus rainfed croplands,and(3)cropping intensities:single,double,and continuous cropping.An accurate reference cropland product(RCP)for the year 2014(RCP2014)produced using QSMT was used as a knowledge base to train and develop the ACCA algorithm that was then applied to the MODIS time-series data for the years 2000–2015.A comparison between the ACCA-derived cropland products(ACPs)for the year 2014(ACP2014)versus RCP2014 provided an overall agreement of 89.4%(kappa=0.814)with six classes:(a)producer’s accuracies varying between 72%and 90%and(b)user’s accuracies varying between 79%and 90%.ACPs for the individual years 2000–2013 and 2015(ACP2000–ACP2013,ACP2015)showed very strong similarities with several other studies.The extent and vigor of the Australian croplands versus cropland fallows were accurately captured by the ACCA algorithm for the years 2000–2015,thus highlighting the value of the study in food security analysis.展开更多
A learning algorithm based on a hard limiter for feedforward neural networks (NN) is presented,and is applied in solving classification problems on separable convex sets and disjoint sets.It has been proved that the a...A learning algorithm based on a hard limiter for feedforward neural networks (NN) is presented,and is applied in solving classification problems on separable convex sets and disjoint sets.It has been proved that the algorithm has stronger classification ability than that of the back propagation (BP) algorithm for the feedforward NN using sigmoid function by simulation.What is more,the models can be implemented with lower cost hardware than that of the BP NN.LEARNIN展开更多
Clustering filtering is usually a practical method for light detection and ranging(LiDAR)point clouds filtering according to their characteristic attributes.However,the amount of point cloud data is extremely large in...Clustering filtering is usually a practical method for light detection and ranging(LiDAR)point clouds filtering according to their characteristic attributes.However,the amount of point cloud data is extremely large in practice,making it impossible to cluster point clouds data directly,and the filtering error is also too large.Moreover,many existing filtering algorithms have poor classification results in discontinuous terrain.This article proposes a new fast classification filtering algorithm based on density clustering,which can solve the problem of point clouds classification in discontinuous terrain.Based on the spatial density of LiDAR point clouds,also the features of the ground object point clouds and the terrain point clouds,the point clouds are clustered firstly by their elevations,and then the plane point clouds are selected.Thus the number of samples and feature dimensions of data are reduced.Using the DBSCAN clustering filtering method,the original point clouds are finally divided into noise point clouds,ground object point clouds,and terrain point clouds.The experiment uses 15 sets of data samples provided by the International Society for Photogrammetry and Remote Sensing(ISPRS),and the results of the proposed algorithm are compared with the other eight classical filtering algorithms.Quantitative and qualitative analysis shows that the proposed algorithm has good applicability in urban areas and rural areas,and is significantly better than other classic filtering algorithms in discontinuous terrain,with a total error of about 10%.The results show that the proposed method is feasible and can be used in different terrains.展开更多
In the realm of contemporary artificial intelligence,machine learning enables automation,allowing systems to naturally acquire and enhance their capabilities through learning.In this cycle,Video recommendation is fini...In the realm of contemporary artificial intelligence,machine learning enables automation,allowing systems to naturally acquire and enhance their capabilities through learning.In this cycle,Video recommendation is finished by utilizing machine learning strategies.A suggestion framework is an interaction of data sifting framework,which is utilized to foresee the“rating”or“inclination”given by the different clients.The expectation depends on past evaluations,history,interest,IMDB rating,and so on.This can be carried out by utilizing collective and substance-based separating approaches which utilize the data given by the different clients,examine them,and afterward suggest the video that suits the client at that specific time.The required datasets for the video are taken from Grouplens.This recommender framework is executed by utilizing Python Programming Language.For building this video recommender framework,two calculations are utilized,for example,K-implies Clustering and KNN grouping.K-implies is one of the unaided AI calculations and the fundamental goal is to bunch comparable sort of information focuses together and discover the examples.For that K-implies searches for a steady‘k'of bunches in a dataset.A group is an assortment of information focuses collected due to specific similitudes.K-Nearest Neighbor is an administered learning calculation utilized for characterization,with the given information;KNN can group new information by examination of the‘k'number of the closest information focuses.The last qualities acquired are through bunching qualities and root mean squared mistake,by using this algorithm we can recommend videos more appropriately based on user previous records and ratings.展开更多
To achieve online automatic classification of product is a great need of e-commerce de-velopment. By analyzing the characteristics of product images, we proposed a fast supervised image classifier which is based on cl...To achieve online automatic classification of product is a great need of e-commerce de-velopment. By analyzing the characteristics of product images, we proposed a fast supervised image classifier which is based on class-specific Pyramid Histogram Of Words (PHOW) descriptor and Im-age-to-Class distance (PHOW/I2C). In the training phase, the local features are densely sampled and represented as soft-voting PHOW descriptors, and then the class-specific descriptors are built with the means and variances of distribution of each visual word in each labelled class. For online testing, the normalized chi-square distance is calculated between the descriptor of query image and each class-specific descriptor. The class label corresponding to the least I2C distance is taken as the final winner. Experiments demonstrate the effectiveness and quickness of our method in the tasks of product clas-sification.展开更多
At present online shopping is very popular as it is very convenient for the customers.However,selecting smartphones from online shops is bit difficult only from the pictures and a short description about the item,and ...At present online shopping is very popular as it is very convenient for the customers.However,selecting smartphones from online shops is bit difficult only from the pictures and a short description about the item,and hence,the customers refer user reviews and star rating.Since user reviews are represented in human languages,sometimes the real semantic of the reviews and satisfaction of the customers are different than what the star rating shows.Also,reading all the reviews are not possible as typically,a smartphone gets thousands of reviews in popular online shopping platform like Amazon.Hence,this work aims to develop a recommended system for smartphones based on aspects of the phones such as screen size,resolution,camera quality,battery life etc.reviewed by users.To that end we apply hybrid approach,which includes three lexicon-based methods and three machine learning modals to analyze specific aspects of user reviews and classify the reviews into six categories--best,better,good or somewhat for positive comments and for negative comments bad or not recommended--.The lexicon-based tool called AFINN together with Random Forest prediction model provides the best classification F1-score 0.95.This system can be customized according to the required aspects of smartphones and the classification of reviews can be done accordingly.展开更多
In this paper,a time-frequency associated multiple signal classification(MUSIC)al-gorithm which is suitable for through-wall detection is proposed.The technology of detecting hu-man targets by through-wall radar can b...In this paper,a time-frequency associated multiple signal classification(MUSIC)al-gorithm which is suitable for through-wall detection is proposed.The technology of detecting hu-man targets by through-wall radar can be used to monitor the status and the location information of human targets behind the wall.However,the detection is out of order when classical MUSIC al-gorithm is applied to estimate the direction of arrival.In order to solve the problem,a time-fre-quency associated MUSIC algorithm suitable for through-wall detection and based on S-band stepped frequency continuous wave(SFCW)radar is researched.By associating inverse fast Fouri-er transform(IFFT)algorithm with MUSIC algorithm,the power enhancement of the target sig-nal is completed according to the distance calculation results in the time domain.Then convert the signal to the frequency domain for direction of arrival(DOA)estimation.The simulations of two-dimensional human target detection in free space and the processing of measured data are com-pleted.By comparing the processing results of the two algorithms on the measured data,accuracy of DOA estimation of proposed algorithm is more than 75%,which is 50%higher than classical MUSIC algorithm.It is verified that the distance and angle of human target can be effectively de-tected via proposed algorithm.展开更多
Detecting abnormal data generated from cyberattacks has emerged as a crucial approach for identifying security threats within in-vehicle networks.The transmission of information through in-vehicle networks needs to fo...Detecting abnormal data generated from cyberattacks has emerged as a crucial approach for identifying security threats within in-vehicle networks.The transmission of information through in-vehicle networks needs to follow specific data for-mats and communication protocols regulations.Typically,statistical algorithms are employed to learn these variation rules and facilitate the identification of abnormal data.However,the effectiveness of anomaly detection outcomes often falls short when confronted with highly deceptive in-vehicle network attacks.In this study,seven representative classification algorithms are selected to detect common in-vehicle network attacks,and a comparative analysis is employed to identify the most suitable and favorable detection method.In consideration of the communication protocol characteristics of in-vehicle networks,an optimal convolutional neural network(CNN)detection algorithm is proposed that uses data field characteristics and classifier selection,and its comprehensive performance is tested.In addition,the concept of Hamming distance between two adjacent packets within the in-vehicle network is introduced,enabling the proposal of an enhanced CNN algorithm that achieves robust detection of challenging-to-identify abnormal data.This paper also presents the proposed CNN classifica-tion algorithm that effectively addresses the issue of high false negative rate(FNR)in abnormal data detection based on the timestamp feature of data packets.The experimental results validate the efficacy of the proposed abnormal data detection algorithm,highlighting its strong detection performance and its potential to provide an effective solution for safeguarding the security of in-vehicle network information.展开更多
To efficiently mine threat intelligence from the vast array of open-source cybersecurity analysis reports on the web,we have developed the Parallel Deep Forest-based Multi-Label Classification(PDFMLC)algorithm.Initial...To efficiently mine threat intelligence from the vast array of open-source cybersecurity analysis reports on the web,we have developed the Parallel Deep Forest-based Multi-Label Classification(PDFMLC)algorithm.Initially,open-source cybersecurity analysis reports are collected and converted into a standardized text format.Subsequently,five tactics category labels are annotated,creating a multi-label dataset for tactics classification.Addressing the limitations of low execution efficiency and scalability in the sequential deep forest algorithm,our PDFMLC algorithm employs broadcast variables and the Lempel-Ziv-Welch(LZW)algorithm,significantly enhancing its acceleration ratio.Furthermore,our proposed PDFMLC algorithm incorporates label mutual information from the established dataset as input features.This captures latent label associations,significantly improving classification accuracy.Finally,we present the PDFMLC-based Threat Intelligence Mining(PDFMLC-TIM)method.Experimental results demonstrate that the PDFMLC algorithm exhibits exceptional node scalability and execution efficiency.Simultaneously,the PDFMLC-TIM method proficiently conducts text classification on cybersecurity analysis reports,extracting tactics entities to construct comprehensive threat intelligence.As a result,successfully formatted STIX2.1 threat intelligence is established.展开更多
Due to the diversified demands of quality of service(QoS) in volume multimedia application, QoS routings for multiservice are becoming a research hotspot in low earth orbit(LEO) satellite networks. A novel QoS sat...Due to the diversified demands of quality of service(QoS) in volume multimedia application, QoS routings for multiservice are becoming a research hotspot in low earth orbit(LEO) satellite networks. A novel QoS satellite routing algorithm for multi-class traffic is proposed. The goal of the routing algorithm is to provide the distinct QoS for different traffic classes and improve the utilization of network resources. Traffic is classified into three classes and allocated priorities based on their QoS requirements, respectively. A priority queuing mechanism guarantees the algorithm to work better for high-priority classes. In order to control the congestion, a blocking probability analysis model is built up based on the Markov process theory. Finally, according to the classification link-cost metrics, routings for different classes are calculated with the distinct QoS requirments and the status of network resource. Simulations verify the performance of the routing algorithm at different time and in different regions, and results demonstrate that the algorithm has great advantages in terms of the average delay and the blocking probability. Meanwhile, the robustness issue is also discussed.展开更多
Crimes are expected to rise with an increase in population and the rising gap between society’s income levels.Crimes contribute to a significant portion of the socioeconomic loss to any society,not only through its i...Crimes are expected to rise with an increase in population and the rising gap between society’s income levels.Crimes contribute to a significant portion of the socioeconomic loss to any society,not only through its indirect damage to the social fabric and peace but also the more direct negative impacts on the economy,social parameters,and reputation of a nation.Policing and other preventive resources are limited and have to be utilized.The conventional methods are being superseded by more modern approaches of machine learning algorithms capable of making predictions where the relationships between the features and the outcomes are complex.Making it possible for such algorithms to provide indicators of specific areas that may become criminal hot-spots.These predictions can be used by policymakers and police personals alike to make effective and informed strategies that can curtail criminal activities and contribute to the nation’s development.This paper aims to predict factors that most affected crimes in Saudi Arabia by developing a machine learning model to predict an acceptable output value.Our results show that FAMD as features selection methods showed more accuracy on machine learning classifiers than the PCA method.The naïve Bayes classifier performs better than other classifiers on both features selections methods with an accuracy of 97.53%for FAMD,and PCA equals to 97.10%.展开更多
基金funded by“The Pearl River Talent Recruitment Program”of Guangdong Province in 2019(Grant No.2019CX01G338)the Research Funding of Shantou University for New Faculty Member(Grant No.NTF19024-2019).
文摘This study presents a framework for predicting geological characteristics based on integrating a stacking classification algorithm(SCA) with a grid search(GS) and K-fold cross validation(K-CV). The SCA includes two learner layers: a primary learner’s layer and meta-classifier layer. The accuracy of the SCA can be improved by using the GS and K-CV. The GS was developed to match the hyper-parameters and optimise complicated problems. The K-CV is commonly applied to changing the validation set in a training set. In general, a GS is usually combined with K-CV to produce a corresponding evaluation index and select the best hyper-parameters. The torque penetration index(TPI) and field penetration index(FPI) are proposed based on shield parameters to express the geological characteristics. The elbow method(EM) and silhouette coefficient(Si) are employed to determine the types of geological characteristics(K) in a Kmeans++ algorithm. A case study on mixed ground in Guangzhou is adopted to validate the applicability of the developed model. The results show that with the developed framework, the four selected parameters, i.e. thrust, advance rate, cutterhead rotation speed and cutterhead torque, can be used to effectively predict the corresponding geological characteristics.
文摘Classification algorithm is one of the key techniques to affect text automatic classification system’s performance, play an important role in automatic classification research area. This paper comparatively analyzed k-NN. VSM and hybrid classification algorithm presented by our research group. Some 2000 pieces of Internet news provided by ChinaInfoBank are used in the experiment. The result shows that the hybrid algorithm’s performance presented by the groups is superior to the other two algorithms.
文摘In this paper, sixty-eight research articles published between 2000 and 2017 as well as textbooks which employed four classification algorithms: K-Nearest-Neighbor (KNN), Support Vector Machines (SVM), Random Forest (RF) and Neural Network (NN) as the main statistical tools were reviewed. The aim was to examine and compare these nonparametric classification methods on the following attributes: robustness to training data, sensitivity to changes, data fitting, stability, ability to handle large data sizes, sensitivity to noise, time invested in parameter tuning, and accuracy. The performances, strengths and shortcomings of each of the algorithms were examined, and finally, a conclusion was arrived at on which one has higher performance. It was evident from the literature reviewed that RF is too sensitive to small changes in the training dataset and is occasionally unstable and tends to overfit in the model. KNN is easy to implement and understand but has a major drawback of becoming significantly slow as the size of the data in use grows, while the ideal value of K for the KNN classifier is difficult to set. SVM and RF are insensitive to noise or overtraining, which shows their ability in dealing with unbalanced data. Larger input datasets will lengthen classification times for NN and KNN more than for SVM and RF. Among these nonparametric classification methods, NN has the potential to become a more widely used classification algorithm, but because of their time-consuming parameter tuning procedure, high level of complexity in computational processing, the numerous types of NN architectures to choose from and the high number of algorithms used for training, most researchers recommend SVM and RF as easier and wieldy used methods which repeatedly achieve results with high accuracies and are often faster to implement.
文摘Many business applications rely on their historical data to predict their business future. The marketing products process is one of the core processes for the business. Customer needs give a useful piece of information that help</span><span style="font-family:Verdana;"><span style="font-family:Verdana;">s</span></span><span style="font-family:Verdana;"> to market the appropriate products at the appropriate time. Moreover, services are considered recently as products. The development of education and health services </span><span style="font-family:Verdana;"><span style="font-family:Verdana;">is</span></span><span style="font-family:Verdana;"> depending on historical data. For the more, reducing online social media networks problems and crimes need a significant source of information. Data analysts need to use an efficient classification algorithm to predict the future of such businesses. However, dealing with a huge quantity of data requires great time to process. Data mining involves many useful techniques that are used to predict statistical data in a variety of business applications. The classification technique is one of the most widely used with a variety of algorithms. In this paper, various classification algorithms are revised in terms of accuracy in different areas of data mining applications. A comprehensive analysis is made after delegated reading of 20 papers in the literature. This paper aims to help data analysts to choose the most suitable classification algorithm for different business applications including business in general, online social media networks, agriculture, health, and education. Results show FFBPN is the most accurate algorithm in the business domain. The Random Forest algorithm is the most accurate in classifying online social networks (OSN) activities. Na<span style="white-space:nowrap;">ï</span>ve Bayes algorithm is the most accurate to classify agriculture datasets. OneR is the most accurate algorithm to classify instances within the health domain. The C4.5 Decision Tree algorithm is the most accurate to classify students’ records to predict degree completion time.
基金Sponsored by the National Natural Science Foundation of China (60773129)the Excellent Youth Science and Technology Foundation of Anhui Province of China ( 08040106808)
文摘A novel classification algorithm based on abnormal magnetic signals is proposed for ground moving targets which are made of ferromagnetic material. According to the effect of diverse targets on earth's magnetism,the moving targets are detected by a magnetic sensor and classified with a simple computation method. The detection sensor is used for collecting a disturbance signal of earth magnetic field from an undetermined target. An optimum category match pattern of target signature is tested by training some statistical samples and designing a classification machine. Three ordinary targets are researched in the paper. The experimental results show that the algorithm has a low computation cost and a better sorting accuracy. This classification method can be applied to ground reconnaissance and target intrusion detection.
文摘A new classification algorithm for web mining is proposed on the basis of general classification algorithm for data mining in order to implement personalized information services. The building tree method of detecting class threshold is used for construction of decision tree according to the concept of user expectation so as to find classification rules in different layers. Compared with the traditional C4.5 algorithm, the disadvantage of excessive adaptation in C4.5 has been improved so that classification results not only have much higher accuracy but also statistic meaning.
基金Foundation item: supported by Intel Corporation (No. 9078)
文摘Packet classification (PC) has become the main method to support the quality of service and security of network application. And two-dimeusioual prefix packet classification (PPC) is the popular one. This paper analyzes the problem of ruler conflict, and then presents a TCAM-based two-dimensional PPC algorithm. This algorithm makes use of the parallelism of TCAM to lookup the longest prefix in one instruction cycle. Then it uses a memory image and associated data structures to eliminate the conflicts between rulers, and performs a fast two-dimeusional PPC. Compared with other algorithms, this algorithm has the least time complexity and less space complexity.
文摘A critical problem associated with the southern part of Nigeria is the rapid alteration of the landscape as a result of logging, agricultural practices, human migration and expansion, oil exploration, exploitation and production activities. These processes have had both positive and negative effects on the economic and socio-political development of the country in general. The negative impacts have led not only to the degradation of the ecosystem but also posing hazards to human health and polluting surface and ground water resources. This has created the need for the development of a rapid, cost effective and efficient land use/land cover (LULC) classification technique to monitor the biophysical dynamics in the region. Due to the complex land cover patterns existing in the study area and the occasionally indistinguishable relationship between land cover and spectral signals, this paper introduces a combined use of unsupervised and supervised image classification for detecting land use/land cover (LULC) classes. With the continuous conflict over the impact of oil activities in the area, this work provides a procedure for detecting LULC change, which is an important factor to consider in the design of an environmental decision-making framework. Results from the use of this technique on Landsat TM and ETM+ of 1987 and 2002 are discussed. The results reveal the pros and cons of the two methods and the effects of their overall accuracy on post-classification change detection.
文摘In this research article, we analyze the multimedia data mining and classification algorithm based on database optimization techniques. Of high performance application requirements of various kinds are springing up constantly makes parallel computer system structure is valued by more and more common but the corresponding software system development lags far behind the development of the hardware system, it is more obvious in the field of database technology application. Multimedia mining is different from the low level of computer multimedia processing technology and the former focuses on the extracted from huge multimedia collection mode which focused on specific features of understanding or extraction from a single multimedia objects. Our research provides new paradigm for the methodology which will be meaningful and necessary.
基金This work was supported by NASA MEaSUREs(grant number NNH13AV82I)U.S.Geological Survey provided sup-plemental funding from other direct and indirect means through its Land Change Science(LCS)Land Remote Sensing(LRS)programs as well as its Climate and Land Use Change Mission Area.
文摘Mapping croplands,including fallow areas,are an important measure to determine the quantity of food that is produced,where they are produced,and when they are produced(e.g.seasonality).Furthermore,croplands are known as water guzzlers by consuming anywhere between 70%and 90%of all human water use globally.Given these facts and the increase in global population to nearly 10 billion by the year 2050,the need for routine,rapid,and automated cropland mapping year-after-year and/or season-after-season is of great importance.The overarching goal of this study was to generate standard and routine cropland products,year-after-year,over very large areas through the use of two novel methods:(a)quantitative spectral matching techniques(QSMTs)applied at continental level and(b)rule-based Automated Cropland Classification Algorithm(ACCA)with the ability to hind-cast,now-cast,and future-cast.Australia was chosen for the study given its extensive croplands,rich history of agriculture,and yet nonexistent routine yearly generated cropland products using multi-temporal remote sensing.This research produced three distinct cropland products using Moderate Resolution Imaging Spectroradiometer(MODIS)250-m normalized difference vegetation index 16-day composite time-series data for 16 years:2000 through 2015.The products consisted of:(1)cropland extent/areas versus cropland fallow areas,(2)irrigated versus rainfed croplands,and(3)cropping intensities:single,double,and continuous cropping.An accurate reference cropland product(RCP)for the year 2014(RCP2014)produced using QSMT was used as a knowledge base to train and develop the ACCA algorithm that was then applied to the MODIS time-series data for the years 2000–2015.A comparison between the ACCA-derived cropland products(ACPs)for the year 2014(ACP2014)versus RCP2014 provided an overall agreement of 89.4%(kappa=0.814)with six classes:(a)producer’s accuracies varying between 72%and 90%and(b)user’s accuracies varying between 79%and 90%.ACPs for the individual years 2000–2013 and 2015(ACP2000–ACP2013,ACP2015)showed very strong similarities with several other studies.The extent and vigor of the Australian croplands versus cropland fallows were accurately captured by the ACCA algorithm for the years 2000–2015,thus highlighting the value of the study in food security analysis.
文摘A learning algorithm based on a hard limiter for feedforward neural networks (NN) is presented,and is applied in solving classification problems on separable convex sets and disjoint sets.It has been proved that the algorithm has stronger classification ability than that of the back propagation (BP) algorithm for the feedforward NN using sigmoid function by simulation.What is more,the models can be implemented with lower cost hardware than that of the BP NN.LEARNIN
基金The Natural Science Foundation of Hunan Province,China(No.2020JJ4601)Open Fund of the Key Laboratory of Highway Engi-neering of Ministry of Education(No.kfj190203).
文摘Clustering filtering is usually a practical method for light detection and ranging(LiDAR)point clouds filtering according to their characteristic attributes.However,the amount of point cloud data is extremely large in practice,making it impossible to cluster point clouds data directly,and the filtering error is also too large.Moreover,many existing filtering algorithms have poor classification results in discontinuous terrain.This article proposes a new fast classification filtering algorithm based on density clustering,which can solve the problem of point clouds classification in discontinuous terrain.Based on the spatial density of LiDAR point clouds,also the features of the ground object point clouds and the terrain point clouds,the point clouds are clustered firstly by their elevations,and then the plane point clouds are selected.Thus the number of samples and feature dimensions of data are reduced.Using the DBSCAN clustering filtering method,the original point clouds are finally divided into noise point clouds,ground object point clouds,and terrain point clouds.The experiment uses 15 sets of data samples provided by the International Society for Photogrammetry and Remote Sensing(ISPRS),and the results of the proposed algorithm are compared with the other eight classical filtering algorithms.Quantitative and qualitative analysis shows that the proposed algorithm has good applicability in urban areas and rural areas,and is significantly better than other classic filtering algorithms in discontinuous terrain,with a total error of about 10%.The results show that the proposed method is feasible and can be used in different terrains.
文摘In the realm of contemporary artificial intelligence,machine learning enables automation,allowing systems to naturally acquire and enhance their capabilities through learning.In this cycle,Video recommendation is finished by utilizing machine learning strategies.A suggestion framework is an interaction of data sifting framework,which is utilized to foresee the“rating”or“inclination”given by the different clients.The expectation depends on past evaluations,history,interest,IMDB rating,and so on.This can be carried out by utilizing collective and substance-based separating approaches which utilize the data given by the different clients,examine them,and afterward suggest the video that suits the client at that specific time.The required datasets for the video are taken from Grouplens.This recommender framework is executed by utilizing Python Programming Language.For building this video recommender framework,two calculations are utilized,for example,K-implies Clustering and KNN grouping.K-implies is one of the unaided AI calculations and the fundamental goal is to bunch comparable sort of information focuses together and discover the examples.For that K-implies searches for a steady‘k'of bunches in a dataset.A group is an assortment of information focuses collected due to specific similitudes.K-Nearest Neighbor is an administered learning calculation utilized for characterization,with the given information;KNN can group new information by examination of the‘k'number of the closest information focuses.The last qualities acquired are through bunching qualities and root mean squared mistake,by using this algorithm we can recommend videos more appropriately based on user previous records and ratings.
基金Supported by the Major Funded Project of National Natural Science Foundation of China (No. 70890083)
文摘To achieve online automatic classification of product is a great need of e-commerce de-velopment. By analyzing the characteristics of product images, we proposed a fast supervised image classifier which is based on class-specific Pyramid Histogram Of Words (PHOW) descriptor and Im-age-to-Class distance (PHOW/I2C). In the training phase, the local features are densely sampled and represented as soft-voting PHOW descriptors, and then the class-specific descriptors are built with the means and variances of distribution of each visual word in each labelled class. For online testing, the normalized chi-square distance is calculated between the descriptor of query image and each class-specific descriptor. The class label corresponding to the least I2C distance is taken as the final winner. Experiments demonstrate the effectiveness and quickness of our method in the tasks of product clas-sification.
文摘At present online shopping is very popular as it is very convenient for the customers.However,selecting smartphones from online shops is bit difficult only from the pictures and a short description about the item,and hence,the customers refer user reviews and star rating.Since user reviews are represented in human languages,sometimes the real semantic of the reviews and satisfaction of the customers are different than what the star rating shows.Also,reading all the reviews are not possible as typically,a smartphone gets thousands of reviews in popular online shopping platform like Amazon.Hence,this work aims to develop a recommended system for smartphones based on aspects of the phones such as screen size,resolution,camera quality,battery life etc.reviewed by users.To that end we apply hybrid approach,which includes three lexicon-based methods and three machine learning modals to analyze specific aspects of user reviews and classify the reviews into six categories--best,better,good or somewhat for positive comments and for negative comments bad or not recommended--.The lexicon-based tool called AFINN together with Random Forest prediction model provides the best classification F1-score 0.95.This system can be customized according to the required aspects of smartphones and the classification of reviews can be done accordingly.
文摘In this paper,a time-frequency associated multiple signal classification(MUSIC)al-gorithm which is suitable for through-wall detection is proposed.The technology of detecting hu-man targets by through-wall radar can be used to monitor the status and the location information of human targets behind the wall.However,the detection is out of order when classical MUSIC al-gorithm is applied to estimate the direction of arrival.In order to solve the problem,a time-fre-quency associated MUSIC algorithm suitable for through-wall detection and based on S-band stepped frequency continuous wave(SFCW)radar is researched.By associating inverse fast Fouri-er transform(IFFT)algorithm with MUSIC algorithm,the power enhancement of the target sig-nal is completed according to the distance calculation results in the time domain.Then convert the signal to the frequency domain for direction of arrival(DOA)estimation.The simulations of two-dimensional human target detection in free space and the processing of measured data are com-pleted.By comparing the processing results of the two algorithms on the measured data,accuracy of DOA estimation of proposed algorithm is more than 75%,which is 50%higher than classical MUSIC algorithm.It is verified that the distance and angle of human target can be effectively de-tected via proposed algorithm.
基金supported by the the Young Scientists Fund of the National Natural Science Foundation of China under Grant 52102447by the Research Fund Project of Beijing Information Science&Technology University under Grant 2023XJJ33.
文摘Detecting abnormal data generated from cyberattacks has emerged as a crucial approach for identifying security threats within in-vehicle networks.The transmission of information through in-vehicle networks needs to follow specific data for-mats and communication protocols regulations.Typically,statistical algorithms are employed to learn these variation rules and facilitate the identification of abnormal data.However,the effectiveness of anomaly detection outcomes often falls short when confronted with highly deceptive in-vehicle network attacks.In this study,seven representative classification algorithms are selected to detect common in-vehicle network attacks,and a comparative analysis is employed to identify the most suitable and favorable detection method.In consideration of the communication protocol characteristics of in-vehicle networks,an optimal convolutional neural network(CNN)detection algorithm is proposed that uses data field characteristics and classifier selection,and its comprehensive performance is tested.In addition,the concept of Hamming distance between two adjacent packets within the in-vehicle network is introduced,enabling the proposal of an enhanced CNN algorithm that achieves robust detection of challenging-to-identify abnormal data.This paper also presents the proposed CNN classifica-tion algorithm that effectively addresses the issue of high false negative rate(FNR)in abnormal data detection based on the timestamp feature of data packets.The experimental results validate the efficacy of the proposed abnormal data detection algorithm,highlighting its strong detection performance and its potential to provide an effective solution for safeguarding the security of in-vehicle network information.
文摘To efficiently mine threat intelligence from the vast array of open-source cybersecurity analysis reports on the web,we have developed the Parallel Deep Forest-based Multi-Label Classification(PDFMLC)algorithm.Initially,open-source cybersecurity analysis reports are collected and converted into a standardized text format.Subsequently,five tactics category labels are annotated,creating a multi-label dataset for tactics classification.Addressing the limitations of low execution efficiency and scalability in the sequential deep forest algorithm,our PDFMLC algorithm employs broadcast variables and the Lempel-Ziv-Welch(LZW)algorithm,significantly enhancing its acceleration ratio.Furthermore,our proposed PDFMLC algorithm incorporates label mutual information from the established dataset as input features.This captures latent label associations,significantly improving classification accuracy.Finally,we present the PDFMLC-based Threat Intelligence Mining(PDFMLC-TIM)method.Experimental results demonstrate that the PDFMLC algorithm exhibits exceptional node scalability and execution efficiency.Simultaneously,the PDFMLC-TIM method proficiently conducts text classification on cybersecurity analysis reports,extracting tactics entities to construct comprehensive threat intelligence.As a result,successfully formatted STIX2.1 threat intelligence is established.
基金Supported by the National High Technology Research and Development Program of China(″863″Program)(2010AAxxx404)~~
文摘Due to the diversified demands of quality of service(QoS) in volume multimedia application, QoS routings for multiservice are becoming a research hotspot in low earth orbit(LEO) satellite networks. A novel QoS satellite routing algorithm for multi-class traffic is proposed. The goal of the routing algorithm is to provide the distinct QoS for different traffic classes and improve the utilization of network resources. Traffic is classified into three classes and allocated priorities based on their QoS requirements, respectively. A priority queuing mechanism guarantees the algorithm to work better for high-priority classes. In order to control the congestion, a blocking probability analysis model is built up based on the Markov process theory. Finally, according to the classification link-cost metrics, routings for different classes are calculated with the distinct QoS requirments and the status of network resource. Simulations verify the performance of the routing algorithm at different time and in different regions, and results demonstrate that the algorithm has great advantages in terms of the average delay and the blocking probability. Meanwhile, the robustness issue is also discussed.
文摘Crimes are expected to rise with an increase in population and the rising gap between society’s income levels.Crimes contribute to a significant portion of the socioeconomic loss to any society,not only through its indirect damage to the social fabric and peace but also the more direct negative impacts on the economy,social parameters,and reputation of a nation.Policing and other preventive resources are limited and have to be utilized.The conventional methods are being superseded by more modern approaches of machine learning algorithms capable of making predictions where the relationships between the features and the outcomes are complex.Making it possible for such algorithms to provide indicators of specific areas that may become criminal hot-spots.These predictions can be used by policymakers and police personals alike to make effective and informed strategies that can curtail criminal activities and contribute to the nation’s development.This paper aims to predict factors that most affected crimes in Saudi Arabia by developing a machine learning model to predict an acceptable output value.Our results show that FAMD as features selection methods showed more accuracy on machine learning classifiers than the PCA method.The naïve Bayes classifier performs better than other classifiers on both features selections methods with an accuracy of 97.53%for FAMD,and PCA equals to 97.10%.