With the popularisation of intelligent power,power devices have different shapes,numbers and specifications.This means that the power data has distributional variability,the model learning process cannot achieve suffi...With the popularisation of intelligent power,power devices have different shapes,numbers and specifications.This means that the power data has distributional variability,the model learning process cannot achieve sufficient extraction of data features,which seriously affects the accuracy and performance of anomaly detection.Therefore,this paper proposes a deep learning-based anomaly detection model for power data,which integrates a data alignment enhancement technique based on random sampling and an adaptive feature fusion method leveraging dimension reduction.Aiming at the distribution variability of power data,this paper developed a sliding window-based data adjustment method for this model,which solves the problem of high-dimensional feature noise and low-dimensional missing data.To address the problem of insufficient feature fusion,an adaptive feature fusion method based on feature dimension reduction and dictionary learning is proposed to improve the anomaly data detection accuracy of the model.In order to verify the effectiveness of the proposed method,we conducted effectiveness comparisons through elimination experiments.The experimental results show that compared with the traditional anomaly detection methods,the method proposed in this paper not only has an advantage in model accuracy,but also reduces the amount of parameter calculation of the model in the process of feature matching and improves the detection speed.展开更多
Arc sound is well known as the potential and available resource for monitoring and controlling of the weld penetration status,which is very important to the welding process quality control,so any attentions have been ...Arc sound is well known as the potential and available resource for monitoring and controlling of the weld penetration status,which is very important to the welding process quality control,so any attentions have been paid to the relationships between the arc sound and welding parameters.Some non-linear mapping models correlating the arc sound to welding parameters have been established with the help of neural networks.However,the research of utilizing arc sound to monitor and diagnose welding process is still in its infancy.A self-made real-time sensing system is applied to make a study of arc sound under typical penetration status,including partial penetration,unstable penetration,full penetration and excessive penetration,in metal inert-gas(MIG) flat tailored welding with spray transfer.Arc sound is pretreated by using wavelet de-noising and short-time windowing technologies,and its characteristics,characterizing weld penetration status,of time-domain,frequency-domain,cepstrum-domain and geometric-domain are extracted.Subsequently,high-dimensional eigenvector is constructed and feature-level parameters are successfully fused utilizing the concept of primary principal component analysis(PCA).Ultimately,60-demensional eigenvector is replaced by the synthesis of 8-demensional vector,which achieves compression for feature space and provides technical supports for pattern classification of typical penetration status with the help of arc sound in MIG welding in the future.展开更多
Due to the amount of data that an IDS needs to examine is very large, it is necessary to reduce the audit features and neglect the redundant features. Therefore, we investigated the performance to reduce TCP/IP featur...Due to the amount of data that an IDS needs to examine is very large, it is necessary to reduce the audit features and neglect the redundant features. Therefore, we investigated the performance to reduce TCP/IP features based on the decision tree rule-based statistical method(DTRS). Its main idea is to create n decision trees in n data subsets, extract the rules, work out the relatively important features in accordance with the frequency of use of different features and demonstrate the performance of reduced features better than primary features by experimental resuits.展开更多
Feature reduction is a key process in pattern recognition. This paper deals with the feature reduction methods for a time-shift invariant feature, power spectrum, in Radar Automatic Target Recognition (RATR) using Hig...Feature reduction is a key process in pattern recognition. This paper deals with the feature reduction methods for a time-shift invariant feature, power spectrum, in Radar Automatic Target Recognition (RATR) using High-Resolution Range Profiles (HRRPs). Several existing feature reduction methods in pattern recognition are analyzed, and a weighted feature reduction method based on Fisher's Discriminant Ratio (FDR) is proposed in this paper. According to the characteristics of radar HRRP target recognition, this proposed method searches the optimal weight vector for power spectra of HRRPs by means of an iterative algorithm, and thus reduces feature dimensionality. Compared with the method of using raw power spectra and some existing feature reduction methods, the weighted feature reduction method can not only reduce feature dimensionality, but also improve recognition performance with low computation complexity. In the recognition experiments based on measured data, the proposed method is robust to different test data and achieves good recognition results.展开更多
The mineralogical features of the oxidation-reduction of graphite deposit in pingdu, Shandong province were studied by field search, polarization microscope, X-ray diffraction (XRD) and SEM. The results show that, the...The mineralogical features of the oxidation-reduction of graphite deposit in pingdu, Shandong province were studied by field search, polarization microscope, X-ray diffraction (XRD) and SEM. The results show that, the major rocks of the reduction graphite zone are graphite-quartz anorthosite, gabbro. The major rocks of the oxidation graphite zone are marble with graphite, biotite granite, monzogranite. The main minerals of the reduction zone are plagioclase, pyroxene, quartz, pyrite. The graphite is aphanitic graphite appearing as dense massive, layered, spherical aggregates. The main minerals of the oxidation zone are calcite, quartz, K-feldspar, biotite, amphibole, chlorite. The graphite is flake graphite uniformly dispersed in the loose, and strongly erosion rocks. A large number of rocks in the area have been suffered chloritization, regional metamorphism, indicating that the formation of the graphite deposit should be related with gabbro melting. The carbon source in the lower part was taken into the mine, and then experienced regional metamorphism.展开更多
In stock market forecasting,the identification of critical features that affect the performance of machine learning(ML)models is crucial to achieve accurate stock price predictions.Several review papers in the literat...In stock market forecasting,the identification of critical features that affect the performance of machine learning(ML)models is crucial to achieve accurate stock price predictions.Several review papers in the literature have focused on various ML,statistical,and deep learning-based methods used in stock market forecasting.However,no survey study has explored feature selection and extraction techniques for stock market forecasting.This survey presents a detailed analysis of 32 research works that use a combination of feature study and ML approaches in various stock market applications.We conduct a systematic search for articles in the Scopus and Web of Science databases for the years 2011–2022.We review a variety of feature selection and feature extraction approaches that have been successfully applied in the stock market analyses presented in the articles.We also describe the combination of feature analysis techniques and ML methods and evaluate their performance.Moreover,we present other survey articles,stock market input and output data,and analyses based on various factors.We find that correlation criteria,random forest,principal component analysis,and autoencoder are the most widely used feature selection and extraction techniques with the best prediction accuracy for various stock market applications.展开更多
Globally,depression is perceived as the most recurrent and risky disor-der among young people and adults under the age of 60.Depression has a strong influence on the usage of words which can be observed in the form of ...Globally,depression is perceived as the most recurrent and risky disor-der among young people and adults under the age of 60.Depression has a strong influence on the usage of words which can be observed in the form of written texts or stories posted on social media.With the help of Natural Language Proces-sing(NLP)and Machine Learning(ML)techniques,the depressive signs expressed by people can be identified at the earliest stage from their Social Media posts.The proposed work aims to introduce an efficacious depression detection model unifying an exemplary feature extraction scheme and a hybrid Long Short-Term Memory network(LSTM)model.The feature extraction process combines a novel feature selection method called Elite Term Score(ETS)and Word2Vec to extract the syntactic and semantic information respectively.First,the ETS method leverages the document level,class level,and corpus level prob-abilities for computing the weightage/score of the terms.Then,the ideal and per-tinent set of features with a high ETS score is selected,and the Word2vec model is trained to generate the intense feature vector representation for the set of selected terms.Finally,the resultant word vector obtained is called EliteVec,which is fed to the hybrid LSTM model based on Honey Badger optimizer with population reduction technique(PHB)which predicts whether the input textual content is depressive or not.The PHB algorithm is integrated to explore and exploit the opti-mal hyperparameters for strengthening the performance of the LSTM network.The comprehensive experiments are carried out with two different Twitter depres-sion corpus based on accuracy and Root Mean Square Error(RMSE)metrics.The results demonstrated that the proposed EliteVec+LSTM+PHB model outperforms the state-of-art models with 98.1%accuracy and 0.0559 RMSE.展开更多
This paper proposes one method of feature selection by using Bayes' theorem. The purpose of the proposed method is to reduce the computational complexity and increase the classification accuracy of the selected featu...This paper proposes one method of feature selection by using Bayes' theorem. The purpose of the proposed method is to reduce the computational complexity and increase the classification accuracy of the selected feature subsets. The dependence between two attributes (binary) is determined based on the probabilities of their joint values that contribute to positive and negative classification decisions. If opposing sets of attribute values do not lead to opposing classification decisions (zero probability), then the two attributes are considered independent of each other, otherwise dependent, and one of them can be removed and thus the number of attributes is reduced. The process must be repeated on all combinations of attributes. The paper also evaluates the approach by comparing it with existing feature selection algorithms over 8 datasets from University of California, Irvine (UCI) machine learning databases. The proposed method shows better results in terms of number of selected features, classification accuracy, and running time than most existing algorithms.展开更多
In order to select effective feature subsets for pattern classification, a novel statistics rough set method is presented based on generalized attribute reduction. Unlike classical reduction approaches, the objects in...In order to select effective feature subsets for pattern classification, a novel statistics rough set method is presented based on generalized attribute reduction. Unlike classical reduction approaches, the objects in universe of discourse are signs of training sample sets and values of attributes are taken as statistical parameters. The binary relation and discernibility matrix for the reduction are induced by distance function. Furthermore, based on the monotony of the distance function defined by Mahalanobis distance, the effective feature subsets are obtained as generalized attribute reducts. Experiment result shows that the classification performance can be improved by using the selected feature subsets.展开更多
Neural network ensemble based on rough sets reduct is proposed to decrease the computational complexity of conventional ensemble feature selection algorithm. First, a dynamic reduction technology combining genetic alg...Neural network ensemble based on rough sets reduct is proposed to decrease the computational complexity of conventional ensemble feature selection algorithm. First, a dynamic reduction technology combining genetic algorithm with resampling method is adopted to obtain reducts with good generalization ability. Second, Multiple BP neural networks based on different reducts are built as base classifiers. According to the idea of selective ensemble, the neural network ensemble with best generalization ability can be found by search strategies. Finally, classification based on neural network ensemble is implemented by combining the predictions of component networks with voting. The method has been verified in the experiment of remote sensing image and five UCI datasets classification. Compared with conventional ensemble feature selection algorithms, it costs less time and lower computing complexity, and the classification accuracy is satisfactory.展开更多
The feature-selection problem in training AdaBoost classifiers is addressed in this paper. A working feature subset is generated by adopting a novel feature subset selection method based on the partial least square (...The feature-selection problem in training AdaBoost classifiers is addressed in this paper. A working feature subset is generated by adopting a novel feature subset selection method based on the partial least square (PLS) regression, and then trained and selected from this feature subset in Boosting. The experiments show that the proposed PLS-based feature-selection method outperforms the current feature ranking method and the random sampling method.展开更多
Feature selection is the pretreatment of data mining. Heuristic search algorithms are often used for this subject. Many heuristic search algorithms are based on discernibility matrices, which only consider the differe...Feature selection is the pretreatment of data mining. Heuristic search algorithms are often used for this subject. Many heuristic search algorithms are based on discernibility matrices, which only consider the difference in information system. Because the similar characteristics are not revealed in discernibility matrix, the result may not be the simplest rules. Although differencesimilitude(DS) methods take both of the difference and the similitude into account, the existing search strategy will cause some important features to be ignored. An improved DS based algorithm is proposed to solve this problem in this paper. An attribute rank function, which considers both of the difference and similitude in feature selection, is defined in the improved algorithm. Experiments show that it is an effective algorithm, especially for large-scale databases. The time complexity of the algorithm is O(| C |^2|U |^2).展开更多
Feature selection(FS)(or feature dimensional reduction,or feature optimization)is an essential process in pattern recognition and machine learning because of its enhanced classification speed and accuracy and reduced ...Feature selection(FS)(or feature dimensional reduction,or feature optimization)is an essential process in pattern recognition and machine learning because of its enhanced classification speed and accuracy and reduced system complexity.FS reduces the number of features extracted in the feature extraction phase by reducing highly correlated features,retaining features with high information gain,and removing features with no weights in classification.In this work,an FS filter-type statistical method is designed and implemented,utilizing a t-test to decrease the convergence between feature subsets by calculating the quality of performance value(QoPV).The approach utilizes the well-designed fitness function to calculate the strength of recognition value(SoRV).The two values are used to rank all features according to the final weight(FW)calculated for each feature subset using a function that prioritizes feature subsets with high SoRV values.An FW is assigned to each feature subset,and those with FWs less than a predefined threshold are removed from the feature subset domain.Experiments are implemented on three datasets:Ryerson Audio-Visual Database of Emotional Speech and Song,Berlin,and Surrey Audio-Visual Expressed Emotion.The performance of the F-test and F-score FS methods are compared to those of the proposed method.Tests are also conducted on a system before and after deploying the FS methods.Results demonstrate the comparative efficiency of the proposed method.The complexity of the system is calculated based on the time overhead required before and after FS.Results show that the proposed method can reduce system complexity.展开更多
The precision of the kernel independent component analysis( KICA) algorithm depends on the type and parameter values of kernel function. Therefore,it's of great significance to study the choice method of KICA'...The precision of the kernel independent component analysis( KICA) algorithm depends on the type and parameter values of kernel function. Therefore,it's of great significance to study the choice method of KICA's kernel parameters for improving its feature dimension reduction result. In this paper, a fitness function was established by use of the ideal of Fisher discrimination function firstly. Then the global optimal solution of fitness function was searched by particle swarm optimization( PSO) algorithm and a multi-state information dimension reduction algorithm based on PSO-KICA was established. Finally,the validity of this algorithm to enhance the precision of feature dimension reduction has been proven.展开更多
Acquired immunodeficiency syndrome (AIDS) is a fatal disease which highly threatens the health of human being. Human immunodeficiency virus (HIV) is the pathogeny for this disease. Investigating HIV-1 protease cleavag...Acquired immunodeficiency syndrome (AIDS) is a fatal disease which highly threatens the health of human being. Human immunodeficiency virus (HIV) is the pathogeny for this disease. Investigating HIV-1 protease cleavage sites can help researchers find or develop protease inhibitors which can restrain the replication of HIV-1, thus resisting AIDS. Feature selection is a new approach for solving the HIV-1 protease cleavage site prediction task and it’s a key point in our research. Comparing with the previous work, there are several advantages in our work. First, a filter method is used to eliminate the redundant features. Second, besides traditional orthogonal encoding (OE), two kinds of newly proposed features extracted by conducting principal component analysis (PCA) and non-linear Fisher transformation (NLF) on AAindex database are used. The two new features are proven to perform better than OE. Third, the data set used here is largely expanded to 1922 samples. Also to improve prediction performance, we conduct parameter optimization for SVM, thus the classifier can obtain better prediction capability. We also fuse the three kinds of features to make sure comprehensive feature representation and improve prediction performance. To effectively evaluate the prediction performance of our method, five parameters, which are much more than previous work, are used to conduct complete comparison. The experimental results of our method show that our method gain better performance than the state of art method. This means that the feature selection combined with feature fusion and classifier parameter optimization can effectively improve HIV-1 cleavage site prediction. Moreover, our work can provide useful help for HIV-1 protease inhibitor developing in the future.展开更多
The performance and efficiency of a baler deteriorate as a result of gearbox failure.One way to overcome this challenge is to select appropriate fault feature parameters for fault diagnosis and monitoring gearboxes.Th...The performance and efficiency of a baler deteriorate as a result of gearbox failure.One way to overcome this challenge is to select appropriate fault feature parameters for fault diagnosis and monitoring gearboxes.This paper proposes a fault feature selection method using an improved adaptive genetic algorithm for a baler gearbox.This method directly obtains the minimum fault feature parameter set that is most sensitive to fault features through attribute reduction.The main benefit of the improved adaptive genetic algorithm is its excellent performance in terms of the efficiency of attribute reduction without requiring prior information.Therefore,this method should be capable of timely diagnosis and monitoring.Experimental validation was performed and promising findings highlighting the relationship between diagnosis results and faults were obtained.The results indicate that when using the improved genetic algorithm to reduce 12 fault characteristic parameters to three without a priori information,100%fault diagnosis accuracy can be achieved based on these fault characteristics and the time required for fault feature parameter selection using the improved genetic algorithm is reduced by half compared to traditional methods.The proposed method provides important insights into the instant fault diagnosis and fault monitoring of mechanical devices.展开更多
Big data is a vast amount of structured and unstructured data that must be dealt with on a regular basis.Dimensionality reduction is the process of converting a huge set of data into data with tiny dimensions so that ...Big data is a vast amount of structured and unstructured data that must be dealt with on a regular basis.Dimensionality reduction is the process of converting a huge set of data into data with tiny dimensions so that equal information may be expressed easily.These tactics are frequently utilized to improve classification or regression challenges while dealing with machine learning issues.To achieve dimensionality reduction for huge data sets,this paper offers a hybrid particle swarm optimization-rough set PSO-RS and Mayfly algorithm-rough set MA-RS.A novel hybrid strategy based on the Mayfly algorithm(MA)and the rough set(RS)is proposed in particular.The performance of the novel hybrid algorithm MA-RS is evaluated by solving six different data sets from the literature.The simulation results and comparison with common reduction methods demonstrate the proposed MARS algorithm’s capacity to handle a wide range of data sets.Finally,the rough set approach,as well as the hybrid optimization techniques PSO-RS and MARS,were applied to deal with the massive data problem.MA-hybrid RS’s method beats other classic dimensionality reduction techniques,according to the experimental results and statistical testing studies.展开更多
This paper presents an effective image classification algorithm based on superpixels and feature fusion.Differing from classical image classification algorithms that extract feature descriptors directly from the origi...This paper presents an effective image classification algorithm based on superpixels and feature fusion.Differing from classical image classification algorithms that extract feature descriptors directly from the original image,the proposed method first segments the input image into superpixels and,then,several different types of features are calculated according to these superpixels.To increase classification accuracy,the dimensions of these features are reduced using the principal component analysis(PCA)algorithm followed by a weighted serial feature fusion strategy.After constructing a coding dictionary using the nonnegative matrix factorization(NMF)algorithm,the input image is recognized by a support vector machine(SVM)model.The effectiveness of the proposed method was tested on the public Scene-15,Caltech-101,and Caltech-256 datasets,and the experimental results demonstrate that the proposed method can effectively improve image classification accuracy.展开更多
In recent times,the images and videos have emerged as one of the most important information source depicting the real time scenarios.Digital images nowadays serve as input for many applications and replacing the manua...In recent times,the images and videos have emerged as one of the most important information source depicting the real time scenarios.Digital images nowadays serve as input for many applications and replacing the manual methods due to their capabilities of 3D scene representation in 2D plane.The capabilities of digital images along with utilization of machine learning methodologies are showing promising accuracies in many applications of prediction and pattern recognition.One of the application fields pertains to detection of diseases occurring in the plants,which are destroying the widespread fields.Traditionally the disease detection process was done by a domain expert using manual examination and laboratory tests.This is a tedious and time consuming process and does not suffice the accuracy levels.This creates a room for the research in developing automation based methods where the images captured through sensors and cameras will be used for detection of disease and control its spreading.The digital images captured from the field’s forms the dataset which trains the machine learning models to predict the nature of the disease.The accuracy of these models is greatly affected by the amount of noise and ailments present in the input images,appropriate segmentation methodology,feature vector development and the choice of machine learning algorithm.To ensure the high rated performance of the designed system the research is moving in a direction to fine tune each and every stage separately considering their dependencies on subsequent stages.Therefore the most optimum solution can be obtained by considering the image processing methodologies for improving the quality of image and then applying statistical methods for feature extraction and selection.The training vector thus developed is capable of presenting the relationship between the feature values and the target class.In this article,a highly accurate system model for detecting the diseases occurring in citrus fruits using a hybrid feature development approach is proposed.The overall improvement in terms of accuracy is measured and depicted.展开更多
Collective improvement in the acceptable or desirable accuracy level of breast cancer image-related pattern recognition using various schemes remains challenging.Despite the combination of multiple schemes to achieve ...Collective improvement in the acceptable or desirable accuracy level of breast cancer image-related pattern recognition using various schemes remains challenging.Despite the combination of multiple schemes to achieve superior ultrasound image pattern recognition by reducing the speckle noise,an enhanced technique is not achieved.The purpose of this study is to introduce a features-based fusion scheme based on enhancement uniform-Local Binary Pattern(LBP)and filtered noise reduction.To surmount the above limitations and achieve the aim of the study,a new descriptor that enhances the LBP features based on the new threshold has been proposed.This paper proposes a multi-level fusion scheme for the auto-classification of the static ultrasound images of breast cancer,which was attained in two stages.First,several images were generated from a single image using the pre-processing method.Themedian andWiener filterswere utilized to lessen the speckle noise and enhance the ultrasound image texture.This strategy allowed the extraction of a powerful feature by reducing the overlap between the benign and malignant image classes.Second,the fusion mechanism allowed the production of diverse features from different filtered images.The feasibility of using the LBP-based texture feature to categorize the ultrasound images was demonstrated.The effectiveness of the proposed scheme is tested on 250 ultrasound images comprising 100 and 150 benign and malignant images,respectively.The proposed method achieved very high accuracy(98%),sensitivity(98%),and specificity(99%).As a result,the fusion process that can help achieve a powerful decision based on different features produced from different filtered images improved the results of the new descriptor of LBP features in terms of accuracy,sensitivity,and specificity.展开更多
文摘With the popularisation of intelligent power,power devices have different shapes,numbers and specifications.This means that the power data has distributional variability,the model learning process cannot achieve sufficient extraction of data features,which seriously affects the accuracy and performance of anomaly detection.Therefore,this paper proposes a deep learning-based anomaly detection model for power data,which integrates a data alignment enhancement technique based on random sampling and an adaptive feature fusion method leveraging dimension reduction.Aiming at the distribution variability of power data,this paper developed a sliding window-based data adjustment method for this model,which solves the problem of high-dimensional feature noise and low-dimensional missing data.To address the problem of insufficient feature fusion,an adaptive feature fusion method based on feature dimension reduction and dictionary learning is proposed to improve the anomaly data detection accuracy of the model.In order to verify the effectiveness of the proposed method,we conducted effectiveness comparisons through elimination experiments.The experimental results show that compared with the traditional anomaly detection methods,the method proposed in this paper not only has an advantage in model accuracy,but also reduces the amount of parameter calculation of the model in the process of feature matching and improves the detection speed.
基金supported by Harbin Academic Pacesetter Foundation of China (Grant No. RC2012XK006002)Zhegjiang Provincial Natural Science Foundation of China (Grant No. Y1110262)+2 种基金Ningbo Municipal Natural Science Foundation of China (Grant No. 2011A610148)Ningbo Municipal Major Industrial Support Project of China (Grant No.2011B1007)Heilongjiang Provincial Natural Science Foundation of China (Grant No. E2007-01)
文摘Arc sound is well known as the potential and available resource for monitoring and controlling of the weld penetration status,which is very important to the welding process quality control,so any attentions have been paid to the relationships between the arc sound and welding parameters.Some non-linear mapping models correlating the arc sound to welding parameters have been established with the help of neural networks.However,the research of utilizing arc sound to monitor and diagnose welding process is still in its infancy.A self-made real-time sensing system is applied to make a study of arc sound under typical penetration status,including partial penetration,unstable penetration,full penetration and excessive penetration,in metal inert-gas(MIG) flat tailored welding with spray transfer.Arc sound is pretreated by using wavelet de-noising and short-time windowing technologies,and its characteristics,characterizing weld penetration status,of time-domain,frequency-domain,cepstrum-domain and geometric-domain are extracted.Subsequently,high-dimensional eigenvector is constructed and feature-level parameters are successfully fused utilizing the concept of primary principal component analysis(PCA).Ultimately,60-demensional eigenvector is replaced by the synthesis of 8-demensional vector,which achieves compression for feature space and provides technical supports for pattern classification of typical penetration status with the help of arc sound in MIG welding in the future.
基金Supported by Natural Science Foundation of Hebei Prov-ince (F2004000133)
文摘Due to the amount of data that an IDS needs to examine is very large, it is necessary to reduce the audit features and neglect the redundant features. Therefore, we investigated the performance to reduce TCP/IP features based on the decision tree rule-based statistical method(DTRS). Its main idea is to create n decision trees in n data subsets, extract the rules, work out the relatively important features in accordance with the frequency of use of different features and demonstrate the performance of reduced features better than primary features by experimental resuits.
基金Partially supported by the National Natural Science Foundation of China (No.60302009)the National Defense Advanced Research Foundation of China (No.413070501).
文摘Feature reduction is a key process in pattern recognition. This paper deals with the feature reduction methods for a time-shift invariant feature, power spectrum, in Radar Automatic Target Recognition (RATR) using High-Resolution Range Profiles (HRRPs). Several existing feature reduction methods in pattern recognition are analyzed, and a weighted feature reduction method based on Fisher's Discriminant Ratio (FDR) is proposed in this paper. According to the characteristics of radar HRRP target recognition, this proposed method searches the optimal weight vector for power spectra of HRRPs by means of an iterative algorithm, and thus reduces feature dimensionality. Compared with the method of using raw power spectra and some existing feature reduction methods, the weighted feature reduction method can not only reduce feature dimensionality, but also improve recognition performance with low computation complexity. In the recognition experiments based on measured data, the proposed method is robust to different test data and achieves good recognition results.
文摘The mineralogical features of the oxidation-reduction of graphite deposit in pingdu, Shandong province were studied by field search, polarization microscope, X-ray diffraction (XRD) and SEM. The results show that, the major rocks of the reduction graphite zone are graphite-quartz anorthosite, gabbro. The major rocks of the oxidation graphite zone are marble with graphite, biotite granite, monzogranite. The main minerals of the reduction zone are plagioclase, pyroxene, quartz, pyrite. The graphite is aphanitic graphite appearing as dense massive, layered, spherical aggregates. The main minerals of the oxidation zone are calcite, quartz, K-feldspar, biotite, amphibole, chlorite. The graphite is flake graphite uniformly dispersed in the loose, and strongly erosion rocks. A large number of rocks in the area have been suffered chloritization, regional metamorphism, indicating that the formation of the graphite deposit should be related with gabbro melting. The carbon source in the lower part was taken into the mine, and then experienced regional metamorphism.
基金funded by The University of Groningen and Prospect Burma organization.
文摘In stock market forecasting,the identification of critical features that affect the performance of machine learning(ML)models is crucial to achieve accurate stock price predictions.Several review papers in the literature have focused on various ML,statistical,and deep learning-based methods used in stock market forecasting.However,no survey study has explored feature selection and extraction techniques for stock market forecasting.This survey presents a detailed analysis of 32 research works that use a combination of feature study and ML approaches in various stock market applications.We conduct a systematic search for articles in the Scopus and Web of Science databases for the years 2011–2022.We review a variety of feature selection and feature extraction approaches that have been successfully applied in the stock market analyses presented in the articles.We also describe the combination of feature analysis techniques and ML methods and evaluate their performance.Moreover,we present other survey articles,stock market input and output data,and analyses based on various factors.We find that correlation criteria,random forest,principal component analysis,and autoencoder are the most widely used feature selection and extraction techniques with the best prediction accuracy for various stock market applications.
文摘Globally,depression is perceived as the most recurrent and risky disor-der among young people and adults under the age of 60.Depression has a strong influence on the usage of words which can be observed in the form of written texts or stories posted on social media.With the help of Natural Language Proces-sing(NLP)and Machine Learning(ML)techniques,the depressive signs expressed by people can be identified at the earliest stage from their Social Media posts.The proposed work aims to introduce an efficacious depression detection model unifying an exemplary feature extraction scheme and a hybrid Long Short-Term Memory network(LSTM)model.The feature extraction process combines a novel feature selection method called Elite Term Score(ETS)and Word2Vec to extract the syntactic and semantic information respectively.First,the ETS method leverages the document level,class level,and corpus level prob-abilities for computing the weightage/score of the terms.Then,the ideal and per-tinent set of features with a high ETS score is selected,and the Word2vec model is trained to generate the intense feature vector representation for the set of selected terms.Finally,the resultant word vector obtained is called EliteVec,which is fed to the hybrid LSTM model based on Honey Badger optimizer with population reduction technique(PHB)which predicts whether the input textual content is depressive or not.The PHB algorithm is integrated to explore and exploit the opti-mal hyperparameters for strengthening the performance of the LSTM network.The comprehensive experiments are carried out with two different Twitter depres-sion corpus based on accuracy and Root Mean Square Error(RMSE)metrics.The results demonstrated that the proposed EliteVec+LSTM+PHB model outperforms the state-of-art models with 98.1%accuracy and 0.0559 RMSE.
文摘This paper proposes one method of feature selection by using Bayes' theorem. The purpose of the proposed method is to reduce the computational complexity and increase the classification accuracy of the selected feature subsets. The dependence between two attributes (binary) is determined based on the probabilities of their joint values that contribute to positive and negative classification decisions. If opposing sets of attribute values do not lead to opposing classification decisions (zero probability), then the two attributes are considered independent of each other, otherwise dependent, and one of them can be removed and thus the number of attributes is reduced. The process must be repeated on all combinations of attributes. The paper also evaluates the approach by comparing it with existing feature selection algorithms over 8 datasets from University of California, Irvine (UCI) machine learning databases. The proposed method shows better results in terms of number of selected features, classification accuracy, and running time than most existing algorithms.
基金This work was supported by the National Basic Research Program of China(No.2001CB309403)
文摘In order to select effective feature subsets for pattern classification, a novel statistics rough set method is presented based on generalized attribute reduction. Unlike classical reduction approaches, the objects in universe of discourse are signs of training sample sets and values of attributes are taken as statistical parameters. The binary relation and discernibility matrix for the reduction are induced by distance function. Furthermore, based on the monotony of the distance function defined by Mahalanobis distance, the effective feature subsets are obtained as generalized attribute reducts. Experiment result shows that the classification performance can be improved by using the selected feature subsets.
基金supported by the National High-Tech Research and Development Plan of China (No.2007AA04Z224)the National Natural Science Foundation of China (No.60775047, 60835004)
文摘Neural network ensemble based on rough sets reduct is proposed to decrease the computational complexity of conventional ensemble feature selection algorithm. First, a dynamic reduction technology combining genetic algorithm with resampling method is adopted to obtain reducts with good generalization ability. Second, Multiple BP neural networks based on different reducts are built as base classifiers. According to the idea of selective ensemble, the neural network ensemble with best generalization ability can be found by search strategies. Finally, classification based on neural network ensemble is implemented by combining the predictions of component networks with voting. The method has been verified in the experiment of remote sensing image and five UCI datasets classification. Compared with conventional ensemble feature selection algorithms, it costs less time and lower computing complexity, and the classification accuracy is satisfactory.
基金Supported by the National Natural Science Foundation of China(60772066)
文摘The feature-selection problem in training AdaBoost classifiers is addressed in this paper. A working feature subset is generated by adopting a novel feature subset selection method based on the partial least square (PLS) regression, and then trained and selected from this feature subset in Boosting. The experiments show that the proposed PLS-based feature-selection method outperforms the current feature ranking method and the random sampling method.
基金Supported by the National Natural Science Foundation of China (90204008)Chen-Guang Plan of Wuhan City(20055003059-3)
文摘Feature selection is the pretreatment of data mining. Heuristic search algorithms are often used for this subject. Many heuristic search algorithms are based on discernibility matrices, which only consider the difference in information system. Because the similar characteristics are not revealed in discernibility matrix, the result may not be the simplest rules. Although differencesimilitude(DS) methods take both of the difference and the similitude into account, the existing search strategy will cause some important features to be ignored. An improved DS based algorithm is proposed to solve this problem in this paper. An attribute rank function, which considers both of the difference and similitude in feature selection, is defined in the improved algorithm. Experiments show that it is an effective algorithm, especially for large-scale databases. The time complexity of the algorithm is O(| C |^2|U |^2).
文摘Feature selection(FS)(or feature dimensional reduction,or feature optimization)is an essential process in pattern recognition and machine learning because of its enhanced classification speed and accuracy and reduced system complexity.FS reduces the number of features extracted in the feature extraction phase by reducing highly correlated features,retaining features with high information gain,and removing features with no weights in classification.In this work,an FS filter-type statistical method is designed and implemented,utilizing a t-test to decrease the convergence between feature subsets by calculating the quality of performance value(QoPV).The approach utilizes the well-designed fitness function to calculate the strength of recognition value(SoRV).The two values are used to rank all features according to the final weight(FW)calculated for each feature subset using a function that prioritizes feature subsets with high SoRV values.An FW is assigned to each feature subset,and those with FWs less than a predefined threshold are removed from the feature subset domain.Experiments are implemented on three datasets:Ryerson Audio-Visual Database of Emotional Speech and Song,Berlin,and Surrey Audio-Visual Expressed Emotion.The performance of the F-test and F-score FS methods are compared to those of the proposed method.Tests are also conducted on a system before and after deploying the FS methods.Results demonstrate the comparative efficiency of the proposed method.The complexity of the system is calculated based on the time overhead required before and after FS.Results show that the proposed method can reduce system complexity.
文摘The precision of the kernel independent component analysis( KICA) algorithm depends on the type and parameter values of kernel function. Therefore,it's of great significance to study the choice method of KICA's kernel parameters for improving its feature dimension reduction result. In this paper, a fitness function was established by use of the ideal of Fisher discrimination function firstly. Then the global optimal solution of fitness function was searched by particle swarm optimization( PSO) algorithm and a multi-state information dimension reduction algorithm based on PSO-KICA was established. Finally,the validity of this algorithm to enhance the precision of feature dimension reduction has been proven.
文摘Acquired immunodeficiency syndrome (AIDS) is a fatal disease which highly threatens the health of human being. Human immunodeficiency virus (HIV) is the pathogeny for this disease. Investigating HIV-1 protease cleavage sites can help researchers find or develop protease inhibitors which can restrain the replication of HIV-1, thus resisting AIDS. Feature selection is a new approach for solving the HIV-1 protease cleavage site prediction task and it’s a key point in our research. Comparing with the previous work, there are several advantages in our work. First, a filter method is used to eliminate the redundant features. Second, besides traditional orthogonal encoding (OE), two kinds of newly proposed features extracted by conducting principal component analysis (PCA) and non-linear Fisher transformation (NLF) on AAindex database are used. The two new features are proven to perform better than OE. Third, the data set used here is largely expanded to 1922 samples. Also to improve prediction performance, we conduct parameter optimization for SVM, thus the classifier can obtain better prediction capability. We also fuse the three kinds of features to make sure comprehensive feature representation and improve prediction performance. To effectively evaluate the prediction performance of our method, five parameters, which are much more than previous work, are used to conduct complete comparison. The experimental results of our method show that our method gain better performance than the state of art method. This means that the feature selection combined with feature fusion and classifier parameter optimization can effectively improve HIV-1 cleavage site prediction. Moreover, our work can provide useful help for HIV-1 protease inhibitor developing in the future.
基金National Key R&D Program of China(2016YFd01304)Postgraduate Innovation Support Project of Shijiazhuang Tiedao University(YC20035).
文摘The performance and efficiency of a baler deteriorate as a result of gearbox failure.One way to overcome this challenge is to select appropriate fault feature parameters for fault diagnosis and monitoring gearboxes.This paper proposes a fault feature selection method using an improved adaptive genetic algorithm for a baler gearbox.This method directly obtains the minimum fault feature parameter set that is most sensitive to fault features through attribute reduction.The main benefit of the improved adaptive genetic algorithm is its excellent performance in terms of the efficiency of attribute reduction without requiring prior information.Therefore,this method should be capable of timely diagnosis and monitoring.Experimental validation was performed and promising findings highlighting the relationship between diagnosis results and faults were obtained.The results indicate that when using the improved genetic algorithm to reduce 12 fault characteristic parameters to three without a priori information,100%fault diagnosis accuracy can be achieved based on these fault characteristics and the time required for fault feature parameter selection using the improved genetic algorithm is reduced by half compared to traditional methods.The proposed method provides important insights into the instant fault diagnosis and fault monitoring of mechanical devices.
文摘Big data is a vast amount of structured and unstructured data that must be dealt with on a regular basis.Dimensionality reduction is the process of converting a huge set of data into data with tiny dimensions so that equal information may be expressed easily.These tactics are frequently utilized to improve classification or regression challenges while dealing with machine learning issues.To achieve dimensionality reduction for huge data sets,this paper offers a hybrid particle swarm optimization-rough set PSO-RS and Mayfly algorithm-rough set MA-RS.A novel hybrid strategy based on the Mayfly algorithm(MA)and the rough set(RS)is proposed in particular.The performance of the novel hybrid algorithm MA-RS is evaluated by solving six different data sets from the literature.The simulation results and comparison with common reduction methods demonstrate the proposed MARS algorithm’s capacity to handle a wide range of data sets.Finally,the rough set approach,as well as the hybrid optimization techniques PSO-RS and MARS,were applied to deal with the massive data problem.MA-hybrid RS’s method beats other classic dimensionality reduction techniques,according to the experimental results and statistical testing studies.
基金the National Key Research and Development Program of China under Grant No.2018AAA0103203.
文摘This paper presents an effective image classification algorithm based on superpixels and feature fusion.Differing from classical image classification algorithms that extract feature descriptors directly from the original image,the proposed method first segments the input image into superpixels and,then,several different types of features are calculated according to these superpixels.To increase classification accuracy,the dimensions of these features are reduced using the principal component analysis(PCA)algorithm followed by a weighted serial feature fusion strategy.After constructing a coding dictionary using the nonnegative matrix factorization(NMF)algorithm,the input image is recognized by a support vector machine(SVM)model.The effectiveness of the proposed method was tested on the public Scene-15,Caltech-101,and Caltech-256 datasets,and the experimental results demonstrate that the proposed method can effectively improve image classification accuracy.
基金This work was supported by Taif University Researchers Supporting Project(TURSP)under number(TURSP-2020/73)Taif University,Taif,Saudi Arabia。
文摘In recent times,the images and videos have emerged as one of the most important information source depicting the real time scenarios.Digital images nowadays serve as input for many applications and replacing the manual methods due to their capabilities of 3D scene representation in 2D plane.The capabilities of digital images along with utilization of machine learning methodologies are showing promising accuracies in many applications of prediction and pattern recognition.One of the application fields pertains to detection of diseases occurring in the plants,which are destroying the widespread fields.Traditionally the disease detection process was done by a domain expert using manual examination and laboratory tests.This is a tedious and time consuming process and does not suffice the accuracy levels.This creates a room for the research in developing automation based methods where the images captured through sensors and cameras will be used for detection of disease and control its spreading.The digital images captured from the field’s forms the dataset which trains the machine learning models to predict the nature of the disease.The accuracy of these models is greatly affected by the amount of noise and ailments present in the input images,appropriate segmentation methodology,feature vector development and the choice of machine learning algorithm.To ensure the high rated performance of the designed system the research is moving in a direction to fine tune each and every stage separately considering their dependencies on subsequent stages.Therefore the most optimum solution can be obtained by considering the image processing methodologies for improving the quality of image and then applying statistical methods for feature extraction and selection.The training vector thus developed is capable of presenting the relationship between the feature values and the target class.In this article,a highly accurate system model for detecting the diseases occurring in citrus fruits using a hybrid feature development approach is proposed.The overall improvement in terms of accuracy is measured and depicted.
基金This research received funding from Duhok Polytechnic University.
文摘Collective improvement in the acceptable or desirable accuracy level of breast cancer image-related pattern recognition using various schemes remains challenging.Despite the combination of multiple schemes to achieve superior ultrasound image pattern recognition by reducing the speckle noise,an enhanced technique is not achieved.The purpose of this study is to introduce a features-based fusion scheme based on enhancement uniform-Local Binary Pattern(LBP)and filtered noise reduction.To surmount the above limitations and achieve the aim of the study,a new descriptor that enhances the LBP features based on the new threshold has been proposed.This paper proposes a multi-level fusion scheme for the auto-classification of the static ultrasound images of breast cancer,which was attained in two stages.First,several images were generated from a single image using the pre-processing method.Themedian andWiener filterswere utilized to lessen the speckle noise and enhance the ultrasound image texture.This strategy allowed the extraction of a powerful feature by reducing the overlap between the benign and malignant image classes.Second,the fusion mechanism allowed the production of diverse features from different filtered images.The feasibility of using the LBP-based texture feature to categorize the ultrasound images was demonstrated.The effectiveness of the proposed scheme is tested on 250 ultrasound images comprising 100 and 150 benign and malignant images,respectively.The proposed method achieved very high accuracy(98%),sensitivity(98%),and specificity(99%).As a result,the fusion process that can help achieve a powerful decision based on different features produced from different filtered images improved the results of the new descriptor of LBP features in terms of accuracy,sensitivity,and specificity.