期刊文献+
共找到1,999篇文章
< 1 2 100 >
每页显示 20 50 100
Curve Classification Based onMean-Variance Feature Weighting and Its Application
1
作者 Zewen Zhang Sheng Zhou Chunzheng Cao 《Computers, Materials & Continua》 SCIE EI 2024年第5期2465-2480,共16页
The classification of functional data has drawn much attention in recent years.The main challenge is representing infinite-dimensional functional data by finite-dimensional features while utilizing those features to a... The classification of functional data has drawn much attention in recent years.The main challenge is representing infinite-dimensional functional data by finite-dimensional features while utilizing those features to achieve better classification accuracy.In this paper,we propose a mean-variance-based(MV)feature weighting method for classifying functional data or functional curves.In the feature extraction stage,each sample curve is approximated by B-splines to transfer features to the coefficients of the spline basis.After that,a feature weighting approach based on statistical principles is introduced by comprehensively considering the between-class differences and within-class variations of the coefficients.We also introduce a scaling parameter to adjust the gap between the weights of features.The new feature weighting approach can adaptively enhance noteworthy local features while mitigating the impact of confusing features.The algorithms for feature weighted K-nearest neighbor and support vector machine classifiers are both provided.Moreover,the new approach can be well integrated into existing functional data classifiers,such as the generalized functional linear model and functional linear discriminant analysis,resulting in a more accurate classification.The performance of the mean-variance-based classifiers is evaluated by simulation studies and real data.The results show that the newfeatureweighting approach significantly improves the classification accuracy for complex functional data. 展开更多
关键词 Functional data analysis CLASSIFICATION feature weighting B-SPLINES
下载PDF
Radar emitter signal recognition based on multi-scale wavelet entropy and feature weighting 被引量:16
2
作者 李一兵 葛娟 +1 位作者 林云 叶方 《Journal of Central South University》 SCIE EI CAS 2014年第11期4254-4260,共7页
In modern electromagnetic environment, radar emitter signal recognition is an important research topic. On the basis of multi-resolution wavelet analysis, an adaptive radar emitter signal recognition method based on m... In modern electromagnetic environment, radar emitter signal recognition is an important research topic. On the basis of multi-resolution wavelet analysis, an adaptive radar emitter signal recognition method based on multi-scale wavelet entropy feature extraction and feature weighting was proposed. With the only priori knowledge of signal to noise ratio(SNR), the method of extracting multi-scale wavelet entropy features of wavelet coefficients from different received signals were combined with calculating uneven weight factor and stability weight factor of the extracted multi-dimensional characteristics. Radar emitter signals of different modulation types and different parameters modulated were recognized through feature weighting and feature fusion. Theoretical analysis and simulation results show that the presented algorithm has a high recognition rate. Additionally, when the SNR is greater than-4 d B, the correct recognition rate is higher than 93%. Hence, the proposed algorithm has great application value. 展开更多
关键词 emitter recognition multi-scale wavelet entropy feature weighting uneven weight factor stability weight factor
下载PDF
Attentive Neighborhood Feature Augmentation for Semi-supervised Learning
3
作者 Qi Liu Jing Li +1 位作者 Xianmin Wang Wenpeng Zhao 《Intelligent Automation & Soft Computing》 SCIE 2023年第8期1753-1771,共19页
Recent state-of-the-art semi-supervised learning(SSL)methods usually use data augmentations as core components.Such methods,however,are limited to simple transformations such as the augmentations under the instance’s... Recent state-of-the-art semi-supervised learning(SSL)methods usually use data augmentations as core components.Such methods,however,are limited to simple transformations such as the augmentations under the instance’s naive representations or the augmentations under the instance’s semantic representations.To tackle this problem,we offer a unique insight into data augmentations and propose a novel data-augmentation-based semi-supervised learning method,called Attentive Neighborhood Feature Aug-mentation(ANFA).The motivation of our method lies in the observation that the relationship between the given feature and its neighborhood may contribute to constructing more reliable transformations for the data,and further facilitating the classifier to distinguish the ambiguous features from the low-dense regions.Specially,we first project the labeled and unlabeled data points into an embedding space and then construct a neighbor graph that serves as a similarity measure based on the similar representations in the embedding space.Then,we employ an attention mechanism to transform the target features into augmented ones based on the neighbor graph.Finally,we formulate a novel semi-supervised loss by encouraging the predictions of the interpolations of augmented features to be consistent with the corresponding interpolations of the predictions of the target features.We carried out exper-iments on SVHN and CIFAR-10 benchmark datasets and the experimental results demonstrate that our method outperforms the state-of-the-art methods when the number of labeled examples is limited. 展开更多
关键词 semi-supervised learning attention mechanism feature augmentation consistency regularization
下载PDF
Decision Cost Feature Weighting and Its Application in Intrusion Detection
4
作者 QIANQuan GENGHuan-tong WANGXu-fa 《Wuhan University Journal of Natural Sciences》 CAS 2004年第5期765-769,共5页
This paper introduces the cost-sensitive feature weighting strategy and its application in intrusion detection. Cost factors and cost matrix are proposed to demonstrate the misclassification cost for IDS. How to get t... This paper introduces the cost-sensitive feature weighting strategy and its application in intrusion detection. Cost factors and cost matrix are proposed to demonstrate the misclassification cost for IDS. How to get the whole minimal risk, is mainly discussed in this paper in detail. From experiments, it shows that although decision cost based weight learning exists somewhat attack misclassification, it can achieve relatively low misclassification costs on the basis of keeping relatively high rate of recognition precision. Key words decision cost - feature weighting - intrusion detection CLC number TP 393. 08 Foundation item: Supported by the National Natural Science Foundation Key Research Plan of China (90104030) and “20 Century Education Development Plan”Biography: QIAN Quan(1972-), male, Ph. D. research direction: computer network, network security and artificial intelligence 展开更多
关键词 decision cost feature weighting intrusion detection
下载PDF
Semi-Supervised Clustering Algorithm Based on Deep Feature Mapping
5
作者 Xiong Xu Chun Zhou +2 位作者 Chenggang Wang Xiaoyan Zhang Hua Meng 《Intelligent Automation & Soft Computing》 SCIE 2023年第7期815-831,共17页
Clustering analysis is one of the main concerns in data mining.A common approach to the clustering process is to bring together points that are close to each other and separate points that are away from each other.The... Clustering analysis is one of the main concerns in data mining.A common approach to the clustering process is to bring together points that are close to each other and separate points that are away from each other.Therefore,measuring the distance between sample points is crucial to the effectiveness of clustering.Filtering features by label information and mea-suring the distance between samples by these features is a common supervised learning method to reconstruct distance metric.However,in many application scenarios,it is very expensive to obtain a large number of labeled samples.In this paper,to solve the clustering problem in the few supervised sample and high data dimensionality scenarios,a novel semi-supervised clustering algorithm is proposed by designing an improved prototype network that attempts to reconstruct the distance metric in the sample space with a small amount of pairwise supervised information,such as Must-Link and Cannot-Link,and then cluster the data in the new metric space.The core idea is to make the similar ones closer and the dissimilar ones further away through embedding mapping.Extensive experiments on both real-world and synthetic datasets show the effectiveness of this algorithm.Average clustering metrics on various datasets improved by 8%compared to the comparison algorithm. 展开更多
关键词 Metric learning semi-supervised clustering prototypical network feature mapping
下载PDF
Evaluation of Feature Subset Selection, Feature Weighting, and Prototype Selection for Biomedical Applications
6
作者 Suzanne LITTLE Sara COLANTONIO +1 位作者 Ovidio SALVETTI Petra PERNER 《Journal of Software Engineering and Applications》 2010年第1期39-49,共11页
Many medical diagnosis applications are characterized by datasets that contain under-represented classes due to the fact that the disease is much rarer than the normal case. In such a situation classifiers such as dec... Many medical diagnosis applications are characterized by datasets that contain under-represented classes due to the fact that the disease is much rarer than the normal case. In such a situation classifiers such as decision trees and Na?ve Bayesian that generalize over the data are not the proper choice as classification methods. Case-based classifiers that can work on the samples seen so far are more appropriate for such a task. We propose to calculate the contingency table and class specific evaluation measures despite the overall accuracy for evaluation purposes of classifiers for these specific data characteristics. We evaluate the different options of our case-based classifier and compare the perform-ance to decision trees and Na?ve Bayesian. Finally, we give an outlook for further work. 展开更多
关键词 feature Subset SELECTION feature weighting PROTOTYPE SELECTION EVALUATION of Methods Prototype-Based CLASSIFICATION Methodology for Prototype-Based CLASSIFICATION CBR in Health
下载PDF
NEW SHADOWED C-MEANS CLUSTERING WITH FEATURE WEIGHTS 被引量:2
7
作者 王丽娜 王建东 姜坚 《Transactions of Nanjing University of Aeronautics and Astronautics》 EI 2012年第3期273-283,共11页
Partition-based clustering with weighted feature is developed in the framework of shadowed sets. The objects in the core and boundary regions, generated by shadowed sets-based clustering, have different impact on the ... Partition-based clustering with weighted feature is developed in the framework of shadowed sets. The objects in the core and boundary regions, generated by shadowed sets-based clustering, have different impact on the prototype of each cluster. By integrating feature weights, a formula for weight calculation is introduced to the clustering algorithm. The selection of weight exponent is crucial for good result and the weights are updated iteratively with each partition of clusters. The convergence of the weighted algorithms is given, and the feasible cluster validity indices of data mining application are utilized. Experimental results on both synthetic and real-life numerical data with different feature weights demonstrate that the weighted algorithm is better than the other unweighted algorithms. 展开更多
关键词 fuzzy C-means shadowed sets shadowed C-means feature weights cluster validity index
下载PDF
Automatic Extraction Method of 3D Feature Guidelines for Complex Cultural Relic Surfaces Based on Point Cloud 被引量:1
8
作者 GENG Yuxin ZHONG Ruofei +1 位作者 HUANG Yuqin SUN Haili 《Journal of Geodesy and Geoinformation Science》 CSCD 2024年第1期16-41,共26页
Cultural relics line graphic serves as a crucial form of traditional artifact information documentation,which is a simple and intuitive product with low cost of displaying compared with 3D models.Dimensionality reduct... Cultural relics line graphic serves as a crucial form of traditional artifact information documentation,which is a simple and intuitive product with low cost of displaying compared with 3D models.Dimensionality reduction is undoubtedly necessary for line drawings.However,most existing methods for artifact drawing rely on the principles of orthographic projection that always cannot avoid angle occlusion and data overlapping while the surface of cultural relics is complex.Therefore,conformal mapping was introduced as a dimensionality reduction way to compensate for the limitation of orthographic projection.Based on the given criteria for assessing surface complexity,this paper proposed a three-dimensional feature guideline extraction method for complex cultural relic surfaces.A 2D and 3D combined factor that measured the importance of points on describing surface features,vertex weight,was designed.Then the selection threshold for feature guideline extraction was determined based on the differences between vertex weight and shape index distributions.The feasibility and stability were verified through experiments conducted on real cultural relic surface data.Results demonstrated the ability of the method to address the challenges associated with the automatic generation of line drawings for complex surfaces.The extraction method and the obtained results will be useful for line graphic drawing,displaying and propaganda of cultural relics. 展开更多
关键词 point cloud conformal parameterization vertex weight surface mesh cultural relics feature extraction
下载PDF
Exploring evolutionary features of directed weighted hazard network in the subway construction 被引量:3
9
作者 Gong-Yu Hou Cong Jin +2 位作者 Zhe-Dong Xu Ping Yu Yi-Yi Cao 《Chinese Physics B》 SCIE EI CAS CSCD 2019年第3期399-407,共9页
A better understanding of previous accidents is an effective way to reduce the occurrence of similar accidents in the future. In this paper, a complex network approach is adopted to construct a directed weighted hazar... A better understanding of previous accidents is an effective way to reduce the occurrence of similar accidents in the future. In this paper, a complex network approach is adopted to construct a directed weighted hazard network(DWHN) to analyze topological features and evolution of accidents in the subway construction. The nodes are hazards and accidents, the edges are multiple relationships of these nodes and the weight of edges are occurrence times of repetitive relationships. The results indicate that the DWHN possesses the property of small-world with small average path length and large clustering coefficient, indicating that hazards have better connectivity and will spread widely and quickly in the network. Moreover,the DWHN has the property of scale-free network for the cumulative degree distribution follows a power-law distribution.It makes DWHN more vulnerable to target attacks. Controlling key nodes with higher degree, strength and betweenness centrality will destroy the connectivity of DWHN and mitigate the spreading of accidents in the network. This study is helpful for discovering inner relationships and evolutionary features of hazards and accidents in the subway construction. 展开更多
关键词 ACCIDENT analysis directed weightED NETWORK complex NETWORK EVOLUTIONARY featureS
下载PDF
A Multi-model Approach for Soft Sensor Development Based on Feature Extraction Using Weighted Kernel Fisher Criterion 被引量:7
10
作者 吕业 杨慧中 《Chinese Journal of Chemical Engineering》 SCIE EI CAS CSCD 2014年第2期146-152,共7页
Multi-model approach can significantly improve the prediction performance of soft sensors in the process with multiple operational conditions.However,traditional clustering algorithms may result in overlapping phenome... Multi-model approach can significantly improve the prediction performance of soft sensors in the process with multiple operational conditions.However,traditional clustering algorithms may result in overlapping phenomenon in subclasses,so that edge classes and outliers cannot be effectively dealt with and the modeling result is not satisfactory.In order to solve these problems,a new feature extraction method based on weighted kernel Fisher criterion is presented to improve the clustering accuracy,in which feature mapping is adopted to bring the edge classes and outliers closer to other normal subclasses.Furthermore,the classified data are used to develop a multiple model based on support vector machine.The proposed method is applied to a bisphenol A production process for prediction of the quality index.The simulation results demonstrate its ability in improving the data classification and the prediction performance of the soft sensor. 展开更多
关键词 feature extraction weighted kernel Fisher criterion CLASSIFICATION soft sensor
下载PDF
A Feature Weighted Mixed Naive Bayes Model for Monitoring Anomalies in the Fan System of a Thermal Power Plant 被引量:3
11
作者 Min Wang Li Sheng +1 位作者 Donghua Zhou Maoyin Chen 《IEEE/CAA Journal of Automatica Sinica》 SCIE EI CSCD 2022年第4期719-727,共9页
With the increasing intelligence and integration,a great number of two-valued variables(generally stored in the form of 0 or 1)often exist in large-scale industrial processes.However,these variables cannot be effectiv... With the increasing intelligence and integration,a great number of two-valued variables(generally stored in the form of 0 or 1)often exist in large-scale industrial processes.However,these variables cannot be effectively handled by traditional monitoring methods such as linear discriminant analysis(LDA),principal component analysis(PCA)and partial least square(PLS)analysis.Recently,a mixed hidden naive Bayesian model(MHNBM)is developed for the first time to utilize both two-valued and continuous variables for abnormality monitoring.Although the MHNBM is effective,it still has some shortcomings that need to be improved.For the MHNBM,the variables with greater correlation to other variables have greater weights,which can not guarantee greater weights are assigned to the more discriminating variables.In addition,the conditional P(x j|x j′,y=k)probability must be computed based on historical data.When the training data is scarce,the conditional probability between continuous variables tends to be uniformly distributed,which affects the performance of MHNBM.Here a novel feature weighted mixed naive Bayes model(FWMNBM)is developed to overcome the above shortcomings.For the FWMNBM,the variables that are more correlated to the class have greater weights,which makes the more discriminating variables contribute more to the model.At the same time,FWMNBM does not have to calculate the conditional probability between variables,thus it is less restricted by the number of training data samples.Compared with the MHNBM,the FWMNBM has better performance,and its effectiveness is validated through numerical cases of a simulation example and a practical case of the Zhoushan thermal power plant(ZTPP),China. 展开更多
关键词 Abnormality monitoring continuous variables feature weighted mixed naive Bayes model(FWMNBM) two-valued variables thermal power plant
下载PDF
Model-Free Ultra-High-Dimensional Feature Screening for Multi-Classified Response Data Based on Weighted Jensen-Shannon Divergence
12
作者 Qingqing Jiang Guangming Deng 《Open Journal of Statistics》 2023年第6期822-849,共28页
In ultra-high-dimensional data, it is common for the response variable to be multi-classified. Therefore, this paper proposes a model-free screening method for variables whose response variable is multi-classified fro... In ultra-high-dimensional data, it is common for the response variable to be multi-classified. Therefore, this paper proposes a model-free screening method for variables whose response variable is multi-classified from the point of view of introducing Jensen-Shannon divergence to measure the importance of covariates. The idea of the method is to calculate the Jensen-Shannon divergence between the conditional probability distribution of the covariates on a given response variable and the unconditional probability distribution of the covariates, and then use the probabilities of the response variables as weights to calculate the weighted Jensen-Shannon divergence, where a larger weighted Jensen-Shannon divergence means that the covariates are more important. Additionally, we also investigated an adapted version of the method, which is to measure the relationship between the covariates and the response variable using the weighted Jensen-Shannon divergence adjusted by the logarithmic factor of the number of categories when the number of categories in each covariate varies. Then, through both theoretical and simulation experiments, it was demonstrated that the proposed methods have sure screening and ranking consistency properties. Finally, the results from simulation and real-dataset experiments show that in feature screening, the proposed methods investigated are robust in performance and faster in computational speed compared with an existing method. 展开更多
关键词 Ultra-High-Dimensional Multi-Classified weighted Jensen-Shannon Divergence MODEL-FREE feature Screening
下载PDF
Weighted Clustering Coefficients Based Feature Extraction and Selection for Collaboration Relation Prediction
13
作者 Jiehua Wu 《国际计算机前沿大会会议论文集》 2018年第1期12-12,共1页
下载PDF
Speech emotion recognition using semi-supervised discriminant analysis
14
作者 徐新洲 黄程韦 +2 位作者 金赟 吴尘 赵力 《Journal of Southeast University(English Edition)》 EI CAS 2014年第1期7-12,共6页
Semi-supervised discriminant analysis SDA which uses a combination of multiple embedding graphs and kernel SDA KSDA are adopted in supervised speech emotion recognition.When the emotional factors of speech signal samp... Semi-supervised discriminant analysis SDA which uses a combination of multiple embedding graphs and kernel SDA KSDA are adopted in supervised speech emotion recognition.When the emotional factors of speech signal samples are preprocessed different categories of features including pitch zero-cross rate energy durance formant and Mel frequency cepstrum coefficient MFCC as well as their statistical parameters are extracted from the utterances of samples.In the dimensionality reduction stage before the feature vectors are sent into classifiers parameter-optimized SDA and KSDA are performed to reduce dimensionality.Experiments on the Berlin speech emotion database show that SDA for supervised speech emotion recognition outperforms some other state-of-the-art dimensionality reduction methods based on spectral graph learning such as linear discriminant analysis LDA locality preserving projections LPP marginal Fisher analysis MFA etc. when multi-class support vector machine SVM classifiers are used.Additionally KSDA can achieve better recognition performance based on kernelized data mapping compared with the above methods including SDA. 展开更多
关键词 speech emotion RECOGNITION speech emotion feature semi-supervised discriminant analysis dimensionality reduction
下载PDF
Feature Representation Based on Sentimental Orientation Classification 被引量:5
15
作者 刘功申 何文垒 +1 位作者 朱杰 来火尧 《China Communications》 SCIE CSCD 2011年第3期90-98,共9页
Online reviews and comments are important information resources for people.A new model,called Sentiment Vector Space Model(SVSM),for feature selection and weighting is proposed to predict the sentiment orientation of ... Online reviews and comments are important information resources for people.A new model,called Sentiment Vector Space Model(SVSM),for feature selection and weighting is proposed to predict the sentiment orientation of comments and reviews,e.g.,sorting out positive reviews from negative ones.Different from that of topic-oriented classification,feature selection of sentiment orientation prediction focuses on language characteristics.Different from traditional algorithms for sentiment classification,this model integrates grammatical knowledge and takes topic correlations into account.Features are extracted,and the similarity between these features and the topic are also computed.The feature similarity is taken as a factor when evaluating the polarity of opinions.The experimental results show that the proposed model is more effective in identifying sentiment orientation than most of the traditional techniques. 展开更多
关键词 sentimant orientation emotional processing feature selection feature weighting
下载PDF
An Embedded Feature Selection Method for Imbalanced Data Classification 被引量:15
16
作者 Haoyue Liu MengChu Zhou Qing Liu 《IEEE/CAA Journal of Automatica Sinica》 EI CSCD 2019年第3期703-715,共13页
Imbalanced data is one type of datasets that are frequently found in real-world applications, e.g., fraud detection and cancer diagnosis. For this type of datasets, improving the accuracy to identify their minority cl... Imbalanced data is one type of datasets that are frequently found in real-world applications, e.g., fraud detection and cancer diagnosis. For this type of datasets, improving the accuracy to identify their minority class is a critically important issue.Feature selection is one method to address this issue. An effective feature selection method can choose a subset of features that favor in the accurate determination of the minority class. A decision tree is a classifier that can be built up by using different splitting criteria. Its advantage is the ease of detecting which feature is used as a splitting node. Thus, it is possible to use a decision tree splitting criterion as a feature selection method. In this paper, an embedded feature selection method using our proposed weighted Gini index(WGI) is proposed. Its comparison results with Chi2, F-statistic and Gini index feature selection methods show that F-statistic and Chi2 reach the best performance when only a few features are selected. As the number of selected features increases, our proposed method has the highest probability of achieving the best performance. The area under a receiver operating characteristic curve(ROC AUC) and F-measure are used as evaluation criteria. Experimental results with two datasets show that ROC AUC performance can be high, even if only a few features are selected and used, and only changes slightly as more and more features are selected. However, the performance of Fmeasure achieves excellent performance only if 20% or more of features are chosen. The results are helpful for practitioners to select a proper feature selection method when facing a practical problem. 展开更多
关键词 Classification and regression TREE feature selection imbalanced data weightED GINI INDEX (WGI)
下载PDF
Improved method for the feature extraction of laser scanner using genetic clustering 被引量:6
17
作者 Yu Jinxia Cai Zixing Duan Zhuohua 《Journal of Systems Engineering and Electronics》 SCIE EI CSCD 2008年第2期280-285,共6页
Feature extraction of range images provided by ranging sensor is a key issue of pattern recognition. To automatically extract the environmental feature sensed by a 2D ranging sensor laser scanner, an improved method b... Feature extraction of range images provided by ranging sensor is a key issue of pattern recognition. To automatically extract the environmental feature sensed by a 2D ranging sensor laser scanner, an improved method based on genetic clustering VGA-clustering is presented. By integrating the spatial neighbouring information of range data into fuzzy clustering algorithm, a weighted fuzzy clustering algorithm (WFCA) instead of standard clustering algorithm is introduced to realize feature extraction of laser scanner. Aimed at the unknown clustering number in advance, several validation index functions are used to estimate the validity of different clustering algorithms and one validation index is selected as the fitness function of genetic algorithm so as to determine the accurate clustering number automatically. At the same time, an improved genetic algorithm IVGA on the basis of VGA is proposed to solve the local optimum of clustering algorithm, which is implemented by increasing the population diversity and improving the genetic operators of elitist rule to enhance the local search capacity and to quicken the convergence speed. By the comparison with other algorithms, the effectiveness of the algorithm introduced is demonstrated. 展开更多
关键词 laser scanner feature extraction weighted fuzzy clustering validation index genetic algorithm.
下载PDF
OPTIMIZED MEANSHIFT TARGET REFERENCE MODEL BASED ON IMPROVED PIXEL WEIGHTING IN VISUAL TRACKING 被引量:4
18
作者 Chen Ken Song Kangkang +1 位作者 Kyoungho Choi Guo Yunyan 《Journal of Electronics(China)》 2013年第3期283-289,共7页
The generic Meanshift is susceptible to interference of background pixels with the target pixels in the kernel of the reference model, which compromises the tracking performance. In this paper, we enhance the target c... The generic Meanshift is susceptible to interference of background pixels with the target pixels in the kernel of the reference model, which compromises the tracking performance. In this paper, we enhance the target color feature by attenuating the background color within the kernel through enlarging the pixel weightings which map to the pixels on the target. This way, the background pixel interference is largely suppressed in the color histogram in the course of constructing the target reference model. In addition, the proposed method also reduces the number of Meanshift iterations, which speeds up the algorithmic convergence. The two tests validate the proposed approach with improved tracking robustness on real-world video sequences. 展开更多
关键词 Visual tracking MEANSHIFT Color feature histogram Pixel weighting Tracking robust-hess
下载PDF
Feature selection for co-training 被引量:2
19
作者 李国正 刘天羽 《Journal of Shanghai University(English Edition)》 CAS 2008年第1期47-51,共5页
Co-training is a semi-supervised learning method, which employs two complementary learners to label the unlabeled data for each other and to predict the test sample together. Previous studies show that redundant infor... Co-training is a semi-supervised learning method, which employs two complementary learners to label the unlabeled data for each other and to predict the test sample together. Previous studies show that redundant information can help improve the ratio of prediction accuracy between semi-supervised learning methods and supervised learning methods. However, redundant information often practically hurts the performance of learning machines. This paper investigates what redundant features have effect on the semi-supervised learning methods, e.g. co-training, and how to remove the redundant features as well as the irrelevant features. Here, FESCOT (feature selection for co-training) is proposed to improve the generalization performance of co-training with feature selection. Experimental results on artificial and real world data sets show that FESCOT helps to remove irrelevant and redundant features that hurt the performance of the co-training method. 展开更多
关键词 feature selection semi-supervised learning CO-TRAINING
下载PDF
An improved algorithm for weighting keywords in web documents 被引量:1
20
作者 孙双 贺樑 +1 位作者 杨静 顾君忠 《Journal of Shanghai University(English Edition)》 CAS 2008年第3期235-239,共5页
In this paper, an improved algorithm, web-based keyword weight algorithm (WKWA), is presented to weight keywords in web documents. WKWA takes into account representation features of web documents and advantages of t... In this paper, an improved algorithm, web-based keyword weight algorithm (WKWA), is presented to weight keywords in web documents. WKWA takes into account representation features of web documents and advantages of the TF*IDF, TFC and ITC algorithms in order to make it more appropriate for web documents. Meanwhile, the presented algorithm is applied to improved vector space model (IVSM). A real system has been implemented for calculating semantic similarities of web documents. Four experiments have been carried out. They are keyword weight calculation, feature item selection, semantic similarity calculation, and WKWA time performance. The results demonstrate accuracy of keyword weight, and semantic similarity is improved. 展开更多
关键词 improved vector space model (IVSM) representation feature feature item keyword weight semantic similarity
下载PDF
上一页 1 2 100 下一页 到第
使用帮助 返回顶部