The classification of functional data has drawn much attention in recent years.The main challenge is representing infinite-dimensional functional data by finite-dimensional features while utilizing those features to a...The classification of functional data has drawn much attention in recent years.The main challenge is representing infinite-dimensional functional data by finite-dimensional features while utilizing those features to achieve better classification accuracy.In this paper,we propose a mean-variance-based(MV)feature weighting method for classifying functional data or functional curves.In the feature extraction stage,each sample curve is approximated by B-splines to transfer features to the coefficients of the spline basis.After that,a feature weighting approach based on statistical principles is introduced by comprehensively considering the between-class differences and within-class variations of the coefficients.We also introduce a scaling parameter to adjust the gap between the weights of features.The new feature weighting approach can adaptively enhance noteworthy local features while mitigating the impact of confusing features.The algorithms for feature weighted K-nearest neighbor and support vector machine classifiers are both provided.Moreover,the new approach can be well integrated into existing functional data classifiers,such as the generalized functional linear model and functional linear discriminant analysis,resulting in a more accurate classification.The performance of the mean-variance-based classifiers is evaluated by simulation studies and real data.The results show that the newfeatureweighting approach significantly improves the classification accuracy for complex functional data.展开更多
In modern electromagnetic environment, radar emitter signal recognition is an important research topic. On the basis of multi-resolution wavelet analysis, an adaptive radar emitter signal recognition method based on m...In modern electromagnetic environment, radar emitter signal recognition is an important research topic. On the basis of multi-resolution wavelet analysis, an adaptive radar emitter signal recognition method based on multi-scale wavelet entropy feature extraction and feature weighting was proposed. With the only priori knowledge of signal to noise ratio(SNR), the method of extracting multi-scale wavelet entropy features of wavelet coefficients from different received signals were combined with calculating uneven weight factor and stability weight factor of the extracted multi-dimensional characteristics. Radar emitter signals of different modulation types and different parameters modulated were recognized through feature weighting and feature fusion. Theoretical analysis and simulation results show that the presented algorithm has a high recognition rate. Additionally, when the SNR is greater than-4 d B, the correct recognition rate is higher than 93%. Hence, the proposed algorithm has great application value.展开更多
This paper introduces the cost-sensitive feature weighting strategy and its application in intrusion detection. Cost factors and cost matrix are proposed to demonstrate the misclassification cost for IDS. How to get t...This paper introduces the cost-sensitive feature weighting strategy and its application in intrusion detection. Cost factors and cost matrix are proposed to demonstrate the misclassification cost for IDS. How to get the whole minimal risk, is mainly discussed in this paper in detail. From experiments, it shows that although decision cost based weight learning exists somewhat attack misclassification, it can achieve relatively low misclassification costs on the basis of keeping relatively high rate of recognition precision. Key words decision cost - feature weighting - intrusion detection CLC number TP 393. 08 Foundation item: Supported by the National Natural Science Foundation Key Research Plan of China (90104030) and “20 Century Education Development Plan”Biography: QIAN Quan(1972-), male, Ph. D. research direction: computer network, network security and artificial intelligence展开更多
Purpose–The aim of this paper is to explore the value preference space associated with the optimization and generalization performance of GEFeWSML.Design/methodology/approach–In this paper,the authors modified the e...Purpose–The aim of this paper is to explore the value preference space associated with the optimization and generalization performance of GEFeWSML.Design/methodology/approach–In this paper,the authors modified the evaluation function utilized by GEFeWSML such that the weights assigned to each objective(i.e.error reduction and feature reduction)were varied.For each set of weights,GEFeWSML was used to evolve FMs for the face,periocular,and faceþperiocular templates.The best performing FMs on the training set(FMtss)and the best performing FMs on the validation set(FM*s)were then applied to the test set in order to evaluate how well they generalized to the unseen subjects.Findings–By varying the weights assigned to each of the objectives,the authors were able to suggest values that would result in the best optimization and generalization performances for facial,periocular,and faceþperiocular recognition.GEFeWSML using these suggested values outperformed the previously reported GEFeWSML results,using significantly fewer features while achieving the same recognition accuracies statistically.Originality/value–In this paper,the authors investigate the relative weighting of each objective using a value preference structure and suggest the best weights to be used for each biometric modality tested.展开更多
Partition-based clustering with weighted feature is developed in the framework of shadowed sets. The objects in the core and boundary regions, generated by shadowed sets-based clustering, have different impact on the ...Partition-based clustering with weighted feature is developed in the framework of shadowed sets. The objects in the core and boundary regions, generated by shadowed sets-based clustering, have different impact on the prototype of each cluster. By integrating feature weights, a formula for weight calculation is introduced to the clustering algorithm. The selection of weight exponent is crucial for good result and the weights are updated iteratively with each partition of clusters. The convergence of the weighted algorithms is given, and the feasible cluster validity indices of data mining application are utilized. Experimental results on both synthetic and real-life numerical data with different feature weights demonstrate that the weighted algorithm is better than the other unweighted algorithms.展开更多
Online reviews and comments are important information resources for people.A new model,called Sentiment Vector Space Model(SVSM),for feature selection and weighting is proposed to predict the sentiment orientation of ...Online reviews and comments are important information resources for people.A new model,called Sentiment Vector Space Model(SVSM),for feature selection and weighting is proposed to predict the sentiment orientation of comments and reviews,e.g.,sorting out positive reviews from negative ones.Different from that of topic-oriented classification,feature selection of sentiment orientation prediction focuses on language characteristics.Different from traditional algorithms for sentiment classification,this model integrates grammatical knowledge and takes topic correlations into account.Features are extracted,and the similarity between these features and the topic are also computed.The feature similarity is taken as a factor when evaluating the polarity of opinions.The experimental results show that the proposed model is more effective in identifying sentiment orientation than most of the traditional techniques.展开更多
In the domain ofmedical imaging,the accurate detection and classification of brain tumors is very important.This study introduces an advanced method for identifying camouflaged brain tumors within images.Our proposed ...In the domain ofmedical imaging,the accurate detection and classification of brain tumors is very important.This study introduces an advanced method for identifying camouflaged brain tumors within images.Our proposed model consists of three steps:Feature extraction,feature fusion,and then classification.The core of this model revolves around a feature extraction framework that combines color-transformed images with deep learning techniques,using the ResNet50 Convolutional Neural Network(CNN)architecture.So the focus is to extract robust feature fromMRI images,particularly emphasizingweighted average features extracted fromthe first convolutional layer renowned for their discriminative power.To enhance model robustness,we introduced a novel feature fusion technique based on the Marine Predator Algorithm(MPA),inspired by the hunting behavior of marine predators and has shown promise in optimizing complex problems.The proposed methodology can accurately classify and detect brain tumors in camouflage images by combining the power of color transformations,deep learning,and feature fusion via MPA,and achieved an accuracy of 98.72%on a more complex dataset surpassing the existing state-of-the-art methods,highlighting the effectiveness of the proposed model.The importance of this research is in its potential to advance the field ofmedical image analysis,particularly in brain tumor diagnosis,where diagnoses early,and accurate classification are critical for improved patient results.展开更多
With the increasing intelligence and integration,a great number of two-valued variables(generally stored in the form of 0 or 1)often exist in large-scale industrial processes.However,these variables cannot be effectiv...With the increasing intelligence and integration,a great number of two-valued variables(generally stored in the form of 0 or 1)often exist in large-scale industrial processes.However,these variables cannot be effectively handled by traditional monitoring methods such as linear discriminant analysis(LDA),principal component analysis(PCA)and partial least square(PLS)analysis.Recently,a mixed hidden naive Bayesian model(MHNBM)is developed for the first time to utilize both two-valued and continuous variables for abnormality monitoring.Although the MHNBM is effective,it still has some shortcomings that need to be improved.For the MHNBM,the variables with greater correlation to other variables have greater weights,which can not guarantee greater weights are assigned to the more discriminating variables.In addition,the conditional P(x j|x j′,y=k)probability must be computed based on historical data.When the training data is scarce,the conditional probability between continuous variables tends to be uniformly distributed,which affects the performance of MHNBM.Here a novel feature weighted mixed naive Bayes model(FWMNBM)is developed to overcome the above shortcomings.For the FWMNBM,the variables that are more correlated to the class have greater weights,which makes the more discriminating variables contribute more to the model.At the same time,FWMNBM does not have to calculate the conditional probability between variables,thus it is less restricted by the number of training data samples.Compared with the MHNBM,the FWMNBM has better performance,and its effectiveness is validated through numerical cases of a simulation example and a practical case of the Zhoushan thermal power plant(ZTPP),China.展开更多
Wind farm power prediction is proposed based on adaptive feature weight entropy fuzzy clustering algorithm.According to the fuzzy clustering method,a large number of historical data of a wind farm in Inner Mongolia ar...Wind farm power prediction is proposed based on adaptive feature weight entropy fuzzy clustering algorithm.According to the fuzzy clustering method,a large number of historical data of a wind farm in Inner Mongolia are analyzed and classified.Model of adaptive entropy weight for clustering is built.Wind power prediction model based on adaptive entropy fuzzy clustering feature weights is built.Simulation results show that the proposed method could distinguish the abnormal data and forecast more accurately and compute fastly.展开更多
Peer-to-Peer technology is one of the most popular techniques nowadays,and it brings some security issues,so the recognition and management of P2P applications on the internet is becoming much more important. The sele...Peer-to-Peer technology is one of the most popular techniques nowadays,and it brings some security issues,so the recognition and management of P2P applications on the internet is becoming much more important. The selection of protocol features is significant to the problem of P2P traffic identification. To overcome the shortcomings of current methods,a new P2P traffic identification algorithm is proposed in this paper. First of all,a detailed statistics of traffic flows on internet is calculated. Secondly,the best feature subset is chosen by binary particle swarm optimization. Finally,every feature in the subset is given a proper weight. In this paper,TCP flows and UDP flows each have a respective feature space,for this is advantageous to traffic identification. The experimental results show that this algorithm could choose the best feature subset effectively,and the identification accuracy is improved by the method of feature weighting.展开更多
With the continuous expansion of software scale,software update and maintenance have become more and more important.However,frequent software code updates will make the software more likely to introduce new defects.So...With the continuous expansion of software scale,software update and maintenance have become more and more important.However,frequent software code updates will make the software more likely to introduce new defects.So how to predict the defects quickly and accurately on the software change has become an important problem for software developers.Current defect prediction methods often cannot reflect the feature information of the defect comprehensively,and the detection effect is not ideal enough.Therefore,we propose a novel defect prediction model named ITNB(Improved Transfer Naive Bayes)based on improved transfer Naive Bayesian algorithm in this paper,which mainly considers the following two aspects:(1)Considering that the edge data of the test set may affect the similarity calculation and final prediction result,we remove the edge data of the test set when calculating the data similarity between the training set and the test set;(2)Considering that each feature dimension has different effects on defect prediction,we construct the calculation formula of training data weight based on feature dimension weight and data gravity,and then calculate the prior probability and the conditional probability of training data from the weight information,so as to construct the weighted bayesian classifier for software defect prediction.To evaluate the performance of the ITNB model,we use six datasets from large open source projects,namely Bugzilla,Columba,Mozilla,JDT,Platform and PostgreSQL.We compare the ITNB model with the transfer Naive Bayesian(TNB)model.The experimental results show that our ITNB model can achieve better results than the TNB model in terms of accurary,precision and pd for within-project and cross-project defect prediction.展开更多
As society has developed,increasing amounts of data have been generated by various industries.The random forest algorithm,as a classification algorithm,is widely used because of its superior performance.However,the ra...As society has developed,increasing amounts of data have been generated by various industries.The random forest algorithm,as a classification algorithm,is widely used because of its superior performance.However,the random forest algorithm uses a simple random sampling feature selection method when generating feature subspaces which cannot distinguish redundant features,thereby affecting its classification accuracy,and resulting in a low data calculation efficiency in the stand-alone mode.In response to the aforementioned problems,related optimization research was conducted with Spark in the present paper.This improved random forest algorithm performs feature extraction according to the calculated feature importance to form a feature subspace.When generating a random forest model,it selects decision trees based on the similarity and classification accuracy of different decision.Experimental results reveal that compared with the original random forest algorithm,the improved algorithm proposed in the present paper exhibited a higher classification accuracy rate and could effectively classify data.展开更多
We propose a novel Laplacian-based algorithm that simplifies triangle surface meshes and can provide different preservation ratios of geometric features.Our efficient and fast algorithm uses a 3D mesh model as input a...We propose a novel Laplacian-based algorithm that simplifies triangle surface meshes and can provide different preservation ratios of geometric features.Our efficient and fast algorithm uses a 3D mesh model as input and initially detects geometric features by using a Laplacian-based shape descriptor(L-descriptor).The algorithm further performs an optimized clustering approach that combines a Laplacian operator with K-means clustering algorithm to perform vertex classification.Moreover,we introduce a Laplacian weighted cost function based on L-descriptor to perform feature weighting and error statistics comparison,which are further used to change the deletion order of the model elements and preserve the saliency features.Our algorithm can provide different preservation ratios of geometric features and may be extended to handle arbitrary mesh topologies.Our experiments on a variety of 3D surface meshes demonstrate the advantages of our algorithm in terms of improving accuracy and applicability,and preserving saliency geometric features.展开更多
Because the hydraulic directional valve usually works in a bad working environment and is disturbed by multi-factor noise,the traditional single sensor monitoring technology is difficult to use for an accurate diagnos...Because the hydraulic directional valve usually works in a bad working environment and is disturbed by multi-factor noise,the traditional single sensor monitoring technology is difficult to use for an accurate diagnosis of it.Therefore,a fault diagnosis method based on multi-sensor information fusion is proposed in this paper to reduce the inaccuracy and uncertainty of traditional single sensor information diagnosis technology and to realize accurate monitoring for the location or diagnosis of early faults in such valves in noisy environments.Firstly,the statistical features of signals collected by the multi-sensor are extracted and the depth features are obtained by a convolutional neural network(CNN)to form a complete and stable multi-dimensional feature set.Secondly,to obtain a weighted multi-dimensional feature set,the multi-dimensional feature sets of similar sensors are combined,and the entropy weight method is used to weight these features to reduce the interference of insensitive features.Finally,the attention mechanism is introduced to improve the dual-channel CNN,which is used to adaptively fuse the weighted multi-dimensional feature sets of heterogeneous sensors,to flexibly select heterogeneous sensor information so as to achieve an accurate diagnosis.Experimental results show that the weighted multi-dimensional feature set obtained by the proposed method has a high fault-representation ability and low information redundancy.It can diagnose simultaneously internal wear faults of the hydraulic directional valve and electromagnetic faults of actuators that are difficult to diagnose by traditional methods.This proposed method can achieve high fault-diagnosis accuracy under severe working conditions.展开更多
Software defect prediction is aimed to find potential defects based on historical data and software features. Software features can reflect the characteristics of software modules. However, some of these features may ...Software defect prediction is aimed to find potential defects based on historical data and software features. Software features can reflect the characteristics of software modules. However, some of these features may be more relevant to the class (defective or non-defective), but others may be redundant or irrelevant. To fully measure the correlation between different features and the class, we present a feature selection approach based on a similarity measure (SM) for software defect prediction. First, the feature weights are updated according to the similarity of samples in different classes. Second, a feature ranking list is generated by sorting the feature weights in descending order, and all feature subsets are selected from the feature ranking list in sequence. Finally, all feature subsets are evaluated on a k-nearest neighbor (KNN) model and measured by an area under curve (AUC) metric for classification performance. The experiments are conducted on 11 National Aeronautics and Space Administration (NASA) datasets, and the results show that our approach performs better than or is comparable to the compared feature selection approaches in terms of classification performance.展开更多
Recent finance and debt crises have made credit risk management one of the most important issues in financial research.Reliable credit scoring models are crucial for financial agencies to evaluate credit applications ...Recent finance and debt crises have made credit risk management one of the most important issues in financial research.Reliable credit scoring models are crucial for financial agencies to evaluate credit applications and have been widely studied in the field of machine learning and statistics.In this paper,a novel feature-weighted support vector machine(SVM) credit scoring model is presented for credit risk assessment,in which an F-score is adopted for feature importance ranking.Considering the mutual interaction among modeling features,random forest is further introduced for relative feature importance measurement.These two feature-weighted versions of SVM are tested against the traditional SVM on two real-world datasets and the research results reveal the validity of the proposed method.展开更多
This paper presents a method using support vector machine with polyspectral kernels for classification of individual transmitters.Then,the neighborhood-roughset-based weighted feature set is proposed.The experiments o...This paper presents a method using support vector machine with polyspectral kernels for classification of individual transmitters.Then,the neighborhood-roughset-based weighted feature set is proposed.The experiments of the algorithms mentioned above indicate that they have consistency,which raises a new weighted kernel.The experiment shows that better classification rate can be achieved.展开更多
Mechanical performance prediction is the key to the transformation and upgrading of steel enterprises to intelligent manufacturing.Due to time-varying manufacturing data,the traditional prediction model of mechanical ...Mechanical performance prediction is the key to the transformation and upgrading of steel enterprises to intelligent manufacturing.Due to time-varying manufacturing data,the traditional prediction model of mechanical properties of hotrolled strip may cause performance degradation or even failure in its use.An MDA-JITL model was thus proposed to handle the modeling problem of complex time-varying data.Relevant parameters were first chosen and normalized.Then,a distance measurement method combining the importance of data attributes and time characteristics was designed to select the most suitable samples for on-line local modeling.After that,using the chosen dataset,a linear local model was created to predict target sample.Finally,an uncertainty evaluation method was designed to evaluate the uncertainty of prediction results.Furthermore,the appropriate dataset partition and off-line simulation experiment scheme were created based on the peculiarities of hot-rolling production.The suggested model performs much better than the classic global model when applied to actual production data from a steel plant.The stability of its prediction accuracy is demonstrated in a simulation prediction for up to five months.Moreover,there is a high link between the uncertainty evaluation metrics and the prediction error of the model,reducing the field sampling rate by 30%in industrial applications in the latest year.展开更多
Solder bump technology has been widely used in electronic packaging. With the development of solder bumps towards higher density and finer pitch, it is more difficult to inspect the defects of solder bumps as they are...Solder bump technology has been widely used in electronic packaging. With the development of solder bumps towards higher density and finer pitch, it is more difficult to inspect the defects of solder bumps as they are hidden in the package. A nondestructive method using the transient active thermography has been proposed to inspect the defects of a solder bump, and we aim at developing an intelligent diagnosis system to eliminate the influence of emissivity unevenness and non-uniform heating on defects recognition in active infrared testing. An improved fuzzy c-means(FCM) algorithm based on the entropy weights is investigated in this paper. The captured thermograms are preprocessed to enhance the thermal contrast between the defective and good bumps. Hot spots corresponding to 16 solder bumps are segmented from the thermal images. The statistical features are calculated and selected appropriately to characterize the status of solder bumps in FCM clustering. The missing bump is identified in the FCM result, which is also validated by the principle component analysis. The intelligent diagnosis system using FCM algorithm with the entropy weights is effective for defects recognition in electronic packages.展开更多
Legal documents are generally big and complex documents because of specific vocabulary,semantics and structure.One of the major challenges in legal processing systems is to generate summary of legal judgements.Till da...Legal documents are generally big and complex documents because of specific vocabulary,semantics and structure.One of the major challenges in legal processing systems is to generate summary of legal judgements.Till date,in most of the legal systems,the summary of judgements is produced manually by legal experts which are then used by Lawyers,Judges and other legal professionals.The manual process of summarization is very inefficient and time-consuming.Automatic text summarization(ATS)is the process of reducing the content of a textual document,while retaining the core description of text through the use of appropriate tool.The present work proposes a novel Fuzzy Analytical Hierarchical process(FAHP)based feature weighting scheme which helps in producing an efficient and effective summary of legal judgement.Model is applied on a number of legal judgements taken from Indian IT Act.Validation of the model is done using ROUGE(Recall-Oriented Understudy for Gisting Evaluation)tool with recall,precision,and f-measure as performance measures.The generated summaries are further assessed by legal experts and are found to be more promising than the summaries generated by traditional approaches.展开更多
基金the National Social Science Foundation of China(Grant No.22BTJ035).
文摘The classification of functional data has drawn much attention in recent years.The main challenge is representing infinite-dimensional functional data by finite-dimensional features while utilizing those features to achieve better classification accuracy.In this paper,we propose a mean-variance-based(MV)feature weighting method for classifying functional data or functional curves.In the feature extraction stage,each sample curve is approximated by B-splines to transfer features to the coefficients of the spline basis.After that,a feature weighting approach based on statistical principles is introduced by comprehensively considering the between-class differences and within-class variations of the coefficients.We also introduce a scaling parameter to adjust the gap between the weights of features.The new feature weighting approach can adaptively enhance noteworthy local features while mitigating the impact of confusing features.The algorithms for feature weighted K-nearest neighbor and support vector machine classifiers are both provided.Moreover,the new approach can be well integrated into existing functional data classifiers,such as the generalized functional linear model and functional linear discriminant analysis,resulting in a more accurate classification.The performance of the mean-variance-based classifiers is evaluated by simulation studies and real data.The results show that the newfeatureweighting approach significantly improves the classification accuracy for complex functional data.
基金Project(61301095)supported by the National Natural Science Foundation of ChinaProject(QC2012C070)supported by Heilongjiang Provincial Natural Science Foundation for the Youth,ChinaProjects(HEUCF130807,HEUCFZ1129)supported by the Fundamental Research Funds for the Central Universities of China
文摘In modern electromagnetic environment, radar emitter signal recognition is an important research topic. On the basis of multi-resolution wavelet analysis, an adaptive radar emitter signal recognition method based on multi-scale wavelet entropy feature extraction and feature weighting was proposed. With the only priori knowledge of signal to noise ratio(SNR), the method of extracting multi-scale wavelet entropy features of wavelet coefficients from different received signals were combined with calculating uneven weight factor and stability weight factor of the extracted multi-dimensional characteristics. Radar emitter signals of different modulation types and different parameters modulated were recognized through feature weighting and feature fusion. Theoretical analysis and simulation results show that the presented algorithm has a high recognition rate. Additionally, when the SNR is greater than-4 d B, the correct recognition rate is higher than 93%. Hence, the proposed algorithm has great application value.
文摘This paper introduces the cost-sensitive feature weighting strategy and its application in intrusion detection. Cost factors and cost matrix are proposed to demonstrate the misclassification cost for IDS. How to get the whole minimal risk, is mainly discussed in this paper in detail. From experiments, it shows that although decision cost based weight learning exists somewhat attack misclassification, it can achieve relatively low misclassification costs on the basis of keeping relatively high rate of recognition precision. Key words decision cost - feature weighting - intrusion detection CLC number TP 393. 08 Foundation item: Supported by the National Natural Science Foundation Key Research Plan of China (90104030) and “20 Century Education Development Plan”Biography: QIAN Quan(1972-), male, Ph. D. research direction: computer network, network security and artificial intelligence
基金the Office of the Director of National Intelligence(ODNI)Center for Academic Excellence(CAE)for the multi-university Center for Advanced Studies in Identity Sciences(CASIS)by the National Science Foundation(NSF)Science&Technology Center:Bio/computational Evolution in Action CONsortium(BEACON).
文摘Purpose–The aim of this paper is to explore the value preference space associated with the optimization and generalization performance of GEFeWSML.Design/methodology/approach–In this paper,the authors modified the evaluation function utilized by GEFeWSML such that the weights assigned to each objective(i.e.error reduction and feature reduction)were varied.For each set of weights,GEFeWSML was used to evolve FMs for the face,periocular,and faceþperiocular templates.The best performing FMs on the training set(FMtss)and the best performing FMs on the validation set(FM*s)were then applied to the test set in order to evaluate how well they generalized to the unseen subjects.Findings–By varying the weights assigned to each of the objectives,the authors were able to suggest values that would result in the best optimization and generalization performances for facial,periocular,and faceþperiocular recognition.GEFeWSML using these suggested values outperformed the previously reported GEFeWSML results,using significantly fewer features while achieving the same recognition accuracies statistically.Originality/value–In this paper,the authors investigate the relative weighting of each objective using a value preference structure and suggest the best weights to be used for each biometric modality tested.
基金Supported by the National Natural Science Foundation of China(61139002)~~
文摘Partition-based clustering with weighted feature is developed in the framework of shadowed sets. The objects in the core and boundary regions, generated by shadowed sets-based clustering, have different impact on the prototype of each cluster. By integrating feature weights, a formula for weight calculation is introduced to the clustering algorithm. The selection of weight exponent is crucial for good result and the weights are updated iteratively with each partition of clusters. The convergence of the weighted algorithms is given, and the feasible cluster validity indices of data mining application are utilized. Experimental results on both synthetic and real-life numerical data with different feature weights demonstrate that the weighted algorithm is better than the other unweighted algorithms.
基金supported by National Natural Science Foundation of China under Grant No. 60703032,and Science and Technology Development Center of the Ministry of Education,China
文摘Online reviews and comments are important information resources for people.A new model,called Sentiment Vector Space Model(SVSM),for feature selection and weighting is proposed to predict the sentiment orientation of comments and reviews,e.g.,sorting out positive reviews from negative ones.Different from that of topic-oriented classification,feature selection of sentiment orientation prediction focuses on language characteristics.Different from traditional algorithms for sentiment classification,this model integrates grammatical knowledge and takes topic correlations into account.Features are extracted,and the similarity between these features and the topic are also computed.The feature similarity is taken as a factor when evaluating the polarity of opinions.The experimental results show that the proposed model is more effective in identifying sentiment orientation than most of the traditional techniques.
基金funding from Prince Sattam bin Abdulaziz University through the Project Number(PSAU/2023/01/24607).
文摘In the domain ofmedical imaging,the accurate detection and classification of brain tumors is very important.This study introduces an advanced method for identifying camouflaged brain tumors within images.Our proposed model consists of three steps:Feature extraction,feature fusion,and then classification.The core of this model revolves around a feature extraction framework that combines color-transformed images with deep learning techniques,using the ResNet50 Convolutional Neural Network(CNN)architecture.So the focus is to extract robust feature fromMRI images,particularly emphasizingweighted average features extracted fromthe first convolutional layer renowned for their discriminative power.To enhance model robustness,we introduced a novel feature fusion technique based on the Marine Predator Algorithm(MPA),inspired by the hunting behavior of marine predators and has shown promise in optimizing complex problems.The proposed methodology can accurately classify and detect brain tumors in camouflage images by combining the power of color transformations,deep learning,and feature fusion via MPA,and achieved an accuracy of 98.72%on a more complex dataset surpassing the existing state-of-the-art methods,highlighting the effectiveness of the proposed model.The importance of this research is in its potential to advance the field ofmedical image analysis,particularly in brain tumor diagnosis,where diagnoses early,and accurate classification are critical for improved patient results.
基金supported by the National Natural Science Foundation of China(62033008,61873143)。
文摘With the increasing intelligence and integration,a great number of two-valued variables(generally stored in the form of 0 or 1)often exist in large-scale industrial processes.However,these variables cannot be effectively handled by traditional monitoring methods such as linear discriminant analysis(LDA),principal component analysis(PCA)and partial least square(PLS)analysis.Recently,a mixed hidden naive Bayesian model(MHNBM)is developed for the first time to utilize both two-valued and continuous variables for abnormality monitoring.Although the MHNBM is effective,it still has some shortcomings that need to be improved.For the MHNBM,the variables with greater correlation to other variables have greater weights,which can not guarantee greater weights are assigned to the more discriminating variables.In addition,the conditional P(x j|x j′,y=k)probability must be computed based on historical data.When the training data is scarce,the conditional probability between continuous variables tends to be uniformly distributed,which affects the performance of MHNBM.Here a novel feature weighted mixed naive Bayes model(FWMNBM)is developed to overcome the above shortcomings.For the FWMNBM,the variables that are more correlated to the class have greater weights,which makes the more discriminating variables contribute more to the model.At the same time,FWMNBM does not have to calculate the conditional probability between variables,thus it is less restricted by the number of training data samples.Compared with the MHNBM,the FWMNBM has better performance,and its effectiveness is validated through numerical cases of a simulation example and a practical case of the Zhoushan thermal power plant(ZTPP),China.
基金supported by the Natural Science Foundation of China under contact(61233007)
文摘Wind farm power prediction is proposed based on adaptive feature weight entropy fuzzy clustering algorithm.According to the fuzzy clustering method,a large number of historical data of a wind farm in Inner Mongolia are analyzed and classified.Model of adaptive entropy weight for clustering is built.Wind power prediction model based on adaptive entropy fuzzy clustering feature weights is built.Simulation results show that the proposed method could distinguish the abnormal data and forecast more accurately and compute fastly.
基金supported in part by National Basic Research Program of China ("973 program") under contract No. 2007CB311106supported by Special Plan Program of National Information Security ("242 program") under contract No. (242) 2009A82
文摘Peer-to-Peer technology is one of the most popular techniques nowadays,and it brings some security issues,so the recognition and management of P2P applications on the internet is becoming much more important. The selection of protocol features is significant to the problem of P2P traffic identification. To overcome the shortcomings of current methods,a new P2P traffic identification algorithm is proposed in this paper. First of all,a detailed statistics of traffic flows on internet is calculated. Secondly,the best feature subset is chosen by binary particle swarm optimization. Finally,every feature in the subset is given a proper weight. In this paper,TCP flows and UDP flows each have a respective feature space,for this is advantageous to traffic identification. The experimental results show that this algorithm could choose the best feature subset effectively,and the identification accuracy is improved by the method of feature weighting.
基金This work is supported in part by the National Science Foundation of China(Nos.61672392,61373038)in part by the National Key Research and Development Program of China(No.2016YFC1202204).
文摘With the continuous expansion of software scale,software update and maintenance have become more and more important.However,frequent software code updates will make the software more likely to introduce new defects.So how to predict the defects quickly and accurately on the software change has become an important problem for software developers.Current defect prediction methods often cannot reflect the feature information of the defect comprehensively,and the detection effect is not ideal enough.Therefore,we propose a novel defect prediction model named ITNB(Improved Transfer Naive Bayes)based on improved transfer Naive Bayesian algorithm in this paper,which mainly considers the following two aspects:(1)Considering that the edge data of the test set may affect the similarity calculation and final prediction result,we remove the edge data of the test set when calculating the data similarity between the training set and the test set;(2)Considering that each feature dimension has different effects on defect prediction,we construct the calculation formula of training data weight based on feature dimension weight and data gravity,and then calculate the prior probability and the conditional probability of training data from the weight information,so as to construct the weighted bayesian classifier for software defect prediction.To evaluate the performance of the ITNB model,we use six datasets from large open source projects,namely Bugzilla,Columba,Mozilla,JDT,Platform and PostgreSQL.We compare the ITNB model with the transfer Naive Bayesian(TNB)model.The experimental results show that our ITNB model can achieve better results than the TNB model in terms of accurary,precision and pd for within-project and cross-project defect prediction.
基金This paper is partially supported by the Social Science Foundation of Hebei Province(No.HB19JL007)the Education technology Foundation of the Ministry of Education(No.2017A01020).
文摘As society has developed,increasing amounts of data have been generated by various industries.The random forest algorithm,as a classification algorithm,is widely used because of its superior performance.However,the random forest algorithm uses a simple random sampling feature selection method when generating feature subspaces which cannot distinguish redundant features,thereby affecting its classification accuracy,and resulting in a low data calculation efficiency in the stand-alone mode.In response to the aforementioned problems,related optimization research was conducted with Spark in the present paper.This improved random forest algorithm performs feature extraction according to the calculated feature importance to form a feature subspace.When generating a random forest model,it selects decision trees based on the similarity and classification accuracy of different decision.Experimental results reveal that compared with the original random forest algorithm,the improved algorithm proposed in the present paper exhibited a higher classification accuracy rate and could effectively classify data.
基金This work has been financially supported by the National High Technology Research and Development Program of China(863 Program)(www.nsfc.gov.cn,No.2015AA016403)the National Natural Science Foundation of China(www.nsfc.gov.cn,No.61602223).
文摘We propose a novel Laplacian-based algorithm that simplifies triangle surface meshes and can provide different preservation ratios of geometric features.Our efficient and fast algorithm uses a 3D mesh model as input and initially detects geometric features by using a Laplacian-based shape descriptor(L-descriptor).The algorithm further performs an optimized clustering approach that combines a Laplacian operator with K-means clustering algorithm to perform vertex classification.Moreover,we introduce a Laplacian weighted cost function based on L-descriptor to perform feature weighting and error statistics comparison,which are further used to change the deletion order of the model elements and preserve the saliency features.Our algorithm can provide different preservation ratios of geometric features and may be extended to handle arbitrary mesh topologies.Our experiments on a variety of 3D surface meshes demonstrate the advantages of our algorithm in terms of improving accuracy and applicability,and preserving saliency geometric features.
基金supported by the National Natural Science Foundation of China(Nos.51805376 and U1709208)the Zhejiang Provincial Natural Science Foundation of China(Nos.LY20E050028 and LD21E050001)。
文摘Because the hydraulic directional valve usually works in a bad working environment and is disturbed by multi-factor noise,the traditional single sensor monitoring technology is difficult to use for an accurate diagnosis of it.Therefore,a fault diagnosis method based on multi-sensor information fusion is proposed in this paper to reduce the inaccuracy and uncertainty of traditional single sensor information diagnosis technology and to realize accurate monitoring for the location or diagnosis of early faults in such valves in noisy environments.Firstly,the statistical features of signals collected by the multi-sensor are extracted and the depth features are obtained by a convolutional neural network(CNN)to form a complete and stable multi-dimensional feature set.Secondly,to obtain a weighted multi-dimensional feature set,the multi-dimensional feature sets of similar sensors are combined,and the entropy weight method is used to weight these features to reduce the interference of insensitive features.Finally,the attention mechanism is introduced to improve the dual-channel CNN,which is used to adaptively fuse the weighted multi-dimensional feature sets of heterogeneous sensors,to flexibly select heterogeneous sensor information so as to achieve an accurate diagnosis.Experimental results show that the weighted multi-dimensional feature set obtained by the proposed method has a high fault-representation ability and low information redundancy.It can diagnose simultaneously internal wear faults of the hydraulic directional valve and electromagnetic faults of actuators that are difficult to diagnose by traditional methods.This proposed method can achieve high fault-diagnosis accuracy under severe working conditions.
基金Project supported by the National Natural Science Foundation of China (Nos. 61673384 and 61502497), the Guangxi Key Laboratory of Trusted Software (No. kx201530), the China Postdoctoral Science Foundation (No. 2015M581887), and the Scientific Research Innovation Project for Graduate Students of Jiangsu Province, China (No. KYLX15 1443)
文摘Software defect prediction is aimed to find potential defects based on historical data and software features. Software features can reflect the characteristics of software modules. However, some of these features may be more relevant to the class (defective or non-defective), but others may be redundant or irrelevant. To fully measure the correlation between different features and the class, we present a feature selection approach based on a similarity measure (SM) for software defect prediction. First, the feature weights are updated according to the similarity of samples in different classes. Second, a feature ranking list is generated by sorting the feature weights in descending order, and all feature subsets are selected from the feature ranking list in sequence. Finally, all feature subsets are evaluated on a k-nearest neighbor (KNN) model and measured by an area under curve (AUC) metric for classification performance. The experiments are conducted on 11 National Aeronautics and Space Administration (NASA) datasets, and the results show that our approach performs better than or is comparable to the compared feature selection approaches in terms of classification performance.
基金Project supported by the National Basic Research Program (973) of China (No. 2011CB706506)the National Natural Science Foundation of China (No. 50905159)+1 种基金the Natural Science Foundation of Jiangsu Province (No. BK2010261)the Fundamental Research Funds for the Central Universities (No. 2011XZZX005),China
文摘Recent finance and debt crises have made credit risk management one of the most important issues in financial research.Reliable credit scoring models are crucial for financial agencies to evaluate credit applications and have been widely studied in the field of machine learning and statistics.In this paper,a novel feature-weighted support vector machine(SVM) credit scoring model is presented for credit risk assessment,in which an F-score is adopted for feature importance ranking.Considering the mutual interaction among modeling features,random forest is further introduced for relative feature importance measurement.These two feature-weighted versions of SVM are tested against the traditional SVM on two real-world datasets and the research results reveal the validity of the proposed method.
基金This work was supported by the National High Technology Research and Development Program of China(Grant No.2009AA01Z430)the Natural Science Foundation of Beijing(No.9092009)the National Science and Technology Major Program(2009ZX03004-003-03).
文摘This paper presents a method using support vector machine with polyspectral kernels for classification of individual transmitters.Then,the neighborhood-roughset-based weighted feature set is proposed.The experiments of the algorithms mentioned above indicate that they have consistency,which raises a new weighted kernel.The experiment shows that better classification rate can be achieved.
基金This work was supported by the National Natural Science Foundation of China(No.52004029)the Fundamental Research Funds for the Central Universities(FRF-TT-20-06).
文摘Mechanical performance prediction is the key to the transformation and upgrading of steel enterprises to intelligent manufacturing.Due to time-varying manufacturing data,the traditional prediction model of mechanical properties of hotrolled strip may cause performance degradation or even failure in its use.An MDA-JITL model was thus proposed to handle the modeling problem of complex time-varying data.Relevant parameters were first chosen and normalized.Then,a distance measurement method combining the importance of data attributes and time characteristics was designed to select the most suitable samples for on-line local modeling.After that,using the chosen dataset,a linear local model was created to predict target sample.Finally,an uncertainty evaluation method was designed to evaluate the uncertainty of prediction results.Furthermore,the appropriate dataset partition and off-line simulation experiment scheme were created based on the peculiarities of hot-rolling production.The suggested model performs much better than the classic global model when applied to actual production data from a steel plant.The stability of its prediction accuracy is demonstrated in a simulation prediction for up to five months.Moreover,there is a high link between the uncertainty evaluation metrics and the prediction error of the model,reducing the field sampling rate by 30%in industrial applications in the latest year.
基金supported by the National Natural Science Foundation of China(Grant Nos.51305179&51305177)the Natural Science Foundation of Jiangsu Higher Education Institutions(Grant No.13KJB510009)
文摘Solder bump technology has been widely used in electronic packaging. With the development of solder bumps towards higher density and finer pitch, it is more difficult to inspect the defects of solder bumps as they are hidden in the package. A nondestructive method using the transient active thermography has been proposed to inspect the defects of a solder bump, and we aim at developing an intelligent diagnosis system to eliminate the influence of emissivity unevenness and non-uniform heating on defects recognition in active infrared testing. An improved fuzzy c-means(FCM) algorithm based on the entropy weights is investigated in this paper. The captured thermograms are preprocessed to enhance the thermal contrast between the defective and good bumps. Hot spots corresponding to 16 solder bumps are segmented from the thermal images. The statistical features are calculated and selected appropriately to characterize the status of solder bumps in FCM clustering. The missing bump is identified in the FCM result, which is also validated by the principle component analysis. The intelligent diagnosis system using FCM algorithm with the entropy weights is effective for defects recognition in electronic packages.
文摘Legal documents are generally big and complex documents because of specific vocabulary,semantics and structure.One of the major challenges in legal processing systems is to generate summary of legal judgements.Till date,in most of the legal systems,the summary of judgements is produced manually by legal experts which are then used by Lawyers,Judges and other legal professionals.The manual process of summarization is very inefficient and time-consuming.Automatic text summarization(ATS)is the process of reducing the content of a textual document,while retaining the core description of text through the use of appropriate tool.The present work proposes a novel Fuzzy Analytical Hierarchical process(FAHP)based feature weighting scheme which helps in producing an efficient and effective summary of legal judgement.Model is applied on a number of legal judgements taken from Indian IT Act.Validation of the model is done using ROUGE(Recall-Oriented Understudy for Gisting Evaluation)tool with recall,precision,and f-measure as performance measures.The generated summaries are further assessed by legal experts and are found to be more promising than the summaries generated by traditional approaches.