期刊文献+
共找到798篇文章
< 1 2 40 >
每页显示 20 50 100
Novel cued search strategy based on information gain for phased array radar 被引量:5
1
作者 Lu Jianbin Hu Weidong Xiao Hui Yu Wenxian 《Journal of Systems Engineering and Electronics》 SCIE EI CSCD 2008年第2期292-297,共6页
A search strategy based on the maximal information gain principle is presented for the cued search of phased array radars. First, the method for the determination of the cued search region, arrangement of beam positio... A search strategy based on the maximal information gain principle is presented for the cued search of phased array radars. First, the method for the determination of the cued search region, arrangement of beam positions, and the calculation of the prior probability distribution of each beam position is discussed. And then, two search algorithms based on information gain are proposed using Shannon entropy and Kullback-Leibler entropy, respectively. With the proposed strategy, the information gain of each beam position is predicted before the radar detection, and the observation is made in the beam position with the maximal information gain. Compared with the conventional method of sequential search and confirm search, simulation results show that the proposed search strategy can distinctly improve the search performance and save radar time resources with the same given detection probability. 展开更多
关键词 phased array radar search strategy cued search beam position information gain.
下载PDF
Information gain based sensor search scheduling for low-earth orbit constellation estimation 被引量:3
2
作者 Bo Wang Jun Li +1 位作者 Wei An Yiyu Zhou 《Journal of Systems Engineering and Electronics》 SCIE EI CSCD 2011年第6期926-932,共7页
This paper addresses the problem of sensor search scheduling in the complicated space environment faced by the low-earth orbit constellation.Several search scheduling methods based on the commonly used information gai... This paper addresses the problem of sensor search scheduling in the complicated space environment faced by the low-earth orbit constellation.Several search scheduling methods based on the commonly used information gain are compared via simulations first.Then a novel search scheduling method in the scenarios of uncertainty observation is proposed based on the global Shannon information gain and beta density based uncertainty model.Simulation results indicate that the beta density model serves a good option for solving the problem of target acquisition in the complicated space environments. 展开更多
关键词 low-earth orbit constellation sensor network scheduling algorithm information gain acquisition.
下载PDF
Sensor management based on fisher information gain 被引量:2
3
作者 Tian Kangsheng Zhu Guangxi 《Journal of Systems Engineering and Electronics》 SCIE EI CSCD 2006年第3期531-534,共4页
Multi-sensor system is becoming increasingly important in a variety of military and civilian applications. In general, single sensor system can only provide partial information about environment while multi-sensor sys... Multi-sensor system is becoming increasingly important in a variety of military and civilian applications. In general, single sensor system can only provide partial information about environment while multi-sensor system provides a synergistic effect, which improves the quality and availability of information. Data fusion techniques can effectively combine this environmental information from similar and/or dissimilar sensors. Sensor management, aiming at improving data fusion performance by controlling sensor behavior, plays an important role in a data fusion process. This paper presents a method using fisher information gain based sensor effectiveness metric for sensor assignment in multi-sensor and multi-target tracking applications. The fisher information gain is computed for every sensor-target pairing on each scan. The advantage for this metric over other ones is that the fisher information gain for the target obtained by multi-sensors is equal to the sum of ones obtained by the individual sensor, so standard transportation problem formulation can be used to solve this problem without importing the concept of pseudo sensor. The simulation results show the effectiveness of the method. 展开更多
关键词 data fusion sensor management fisher information gain linear programming.
下载PDF
Application of Information Gain to Estimating the Seismic Tendency 被引量:2
4
作者 Shen Ping,Shen Jing,and Feng GuozhengInstitute of Geophysics,SSB,Beijing 100081,China 《Earthquake Research in China》 1997年第2期44-50,共7页
Considering two seismic parameters,energy and the frequency of an earthquake as a whole from the definition of information gain in entropy,we study the information gain of M≥6.0 earthquakes from the world earthquake ... Considering two seismic parameters,energy and the frequency of an earthquake as a whole from the definition of information gain in entropy,we study the information gain of M≥6.0 earthquakes from the world earthquake catalogue during 1900-1992.The results show that the information gain decreases before strong earthquakes.Our study of the recent seismic tendency of large earthquakes shows that the probability of earthquakes with M≥8.5 is low for the near future around the world.The information gain technique provides a new approach to tracing and predicting earthquakes from the data of moderate and small earthquakes. 展开更多
关键词 Application of information gain to Estimating the Seismic Tendency
下载PDF
Assessment of Sentiment Analysis Using Information Gain Based Feature Selection Approach
5
作者 R.Madhumathi A.Meena Kowshalya R.Shruthi 《Computer Systems Science & Engineering》 SCIE EI 2022年第11期849-860,共12页
Sentiment analysis is the process of determining the intention or emotion behind an article.The subjective information from the context is analyzed by the sentimental analysis of the people’s opinion.The data that is... Sentiment analysis is the process of determining the intention or emotion behind an article.The subjective information from the context is analyzed by the sentimental analysis of the people’s opinion.The data that is analyzed quantifies the reactions or sentiments and reveals the information’s contextual polarity.In social behavior,sentiment can be thought of as a latent variable.Measuring and comprehending this behavior could help us to better understand the social issues.Because sentiments are domain specific,sentimental analysis in a specific context is critical in any real-world scenario.Textual sentiment analysis is done in sentence,document level and feature levels.This work introduces a new Information Gain based Feature Selection(IGbFS)algorithm for selecting highly correlated features eliminating irrelevant and redundant ones.Extensive textual sentiment analysis on sentence,document and feature levels are performed by exploiting the proposed Information Gain based Feature Selection algorithm.The analysis is done based on the datasets from Cornell and Kaggle repositories.When compared to existing baseline classifiers,the suggested Information Gain based classifier resulted in an increased accuracy of 96%for document,97.4%for sentence and 98.5%for feature levels respectively.Also,the proposed method is tested with IMDB,Yelp 2013 and Yelp 2014 datasets.Experimental results for these high dimensional datasets give increased accuracy of 95%,96%and 98%for the proposed Information Gain based classifier for document,sentence and feature levels respectively compared to existing baseline classifiers. 展开更多
关键词 Sentiment analysis sentence level document level feature level information gain
下载PDF
Intelligent Biometric Information Management
6
作者 Harry Wechsler 《Intelligent Information Management》 2010年第9期499-511,共13页
We advance here a novel methodology for robust intelligent biometric information management with inferences and predictions made using randomness and complexity concepts. Intelligence refers to learning, adap- tation,... We advance here a novel methodology for robust intelligent biometric information management with inferences and predictions made using randomness and complexity concepts. Intelligence refers to learning, adap- tation, and functionality, and robustness refers to the ability to handle incomplete and/or corrupt adversarial information, on one side, and image and or device variability, on the other side. The proposed methodology is model-free and non-parametric. It draws support from discriminative methods using likelihood ratios to link at the conceptual level biometrics and forensics. It further links, at the modeling and implementation level, the Bayesian framework, statistical learning theory (SLT) using transduction and semi-supervised lea- rning, and Information Theory (IY) using mutual information. The key concepts supporting the proposed methodology are a) local estimation to facilitate learning and prediction using both labeled and unlabeled data;b) similarity metrics using regularity of patterns, randomness deficiency, and Kolmogorov complexity (similar to MDL) using strangeness/typicality and ranking p-values;and c) the Cover – Hart theorem on the asymptotical performance of k-nearest neighbors approaching the optimal Bayes error. Several topics on biometric inference and prediction related to 1) multi-level and multi-layer data fusion including quality and multi-modal biometrics;2) score normalization and revision theory;3) face selection and tracking;and 4) identity management, are described here using an integrated approach that includes transduction and boosting for ranking and sequential fusion/aggregation, respectively, on one side, and active learning and change/ outlier/intrusion detection realized using information gain and martingale, respectively, on the other side. The methodology proposed can be mapped to additional types of information beyond biometrics. 展开更多
关键词 Authentication Biometrics Boosting Change DETECTION Complexity Cross-Matching Data Fusion Ensemble Methods Forensics Identity MANAGEMENT Imposters Inference INTELLigENT information MANAGEMENT Margin gain MDL Multi-Sensory Integration Outlier DETECTION P-VALUES Quality Randomness Ranking Score Normalization Semi-Supervised Learning Spectral Clustering STRANGENESS Surveillance Tracking TYPICALITY Transduction
下载PDF
面向钓鱼网站敏感特征项选取的IIGAIN算法 被引量:5
7
作者 王燕 王兴芬 任俊玲 《计算机应用与软件》 CSCD 2016年第4期297-301,共5页
传统的钓鱼网站检测技术主要采用随机或者凭经验选取敏感特征项用于检测的方法,无法保证检测的准确性。为此,提出一种面向钓鱼网站敏感特征选取的改进的信息增益算法IIGAIN(Improved Information Gain Algorithm)。该算法综合考虑了特... 传统的钓鱼网站检测技术主要采用随机或者凭经验选取敏感特征项用于检测的方法,无法保证检测的准确性。为此,提出一种面向钓鱼网站敏感特征选取的改进的信息增益算法IIGAIN(Improved Information Gain Algorithm)。该算法综合考虑了特征项的类内离散度,通过对特征项的类内离散度差值做相应的处理,以处理后的结果作为惩罚项改进信息增益算法。实验结果表明,利用IIGAIN进行特征项选取的钓鱼网站检测方法的检测准确性明显优于随机选取特征项的钓鱼网站检测方法。 展开更多
关键词 钓鱼网站检测 敏感特征项 信息增益 类内离散度
下载PDF
Application of artificial neural network and information theory to detection of insulators
8
作者 李卫东 唐丽艳 +1 位作者 宋家骅 柳焯 《Journal of Harbin Institute of Technology(New Series)》 EI CAS 2000年第3期32-36,共5页
Information theory is used to obtain the information gain for each identification feature, and this gain is used as the weight factor for this feature to stress the role of effective feature, and the ART model based o... Information theory is used to obtain the information gain for each identification feature, and this gain is used as the weight factor for this feature to stress the role of effective feature, and the ART model based on artificial neural network theory is then used for identification thereby forming the detection system for poor insulators. Exper iments and calculations show this approach is correct and feasible. 展开更多
关键词 information gain artificial NEURAL network ELECTRICAL power system DETECTION of insulators
下载PDF
集合CHI与IG的特征选择方法 被引量:22
9
作者 王光 邱云飞 史庆伟 《计算机应用研究》 CSCD 北大核心 2012年第7期2454-2456,共3页
通过分析特征词与类别间的相关性,在原有卡方特征选择和信息增益特征选择的基础上提出了两个参数,使得选出的特征词集中分布在某一特定类,并且使特征词在这一类中出现的次数尽可能地多;最后集合CHI与IG两种算法得到一种集合特征选择方法... 通过分析特征词与类别间的相关性,在原有卡方特征选择和信息增益特征选择的基础上提出了两个参数,使得选出的特征词集中分布在某一特定类,并且使特征词在这一类中出现的次数尽可能地多;最后集合CHI与IG两种算法得到一种集合特征选择方法(CCIF)。通过实验对比传统的卡方特征选择、信息增益和CCIF方法,CCIF方法使得算法的微平均查准率得到了明显的提高。 展开更多
关键词 文本分类 特征选择 卡方统计 信息增益
下载PDF
基于IG-LASSO模型的城市空气质量指数混合预测研究 被引量:12
10
作者 刘炳春 郑红梅 张斌 《环境科学与技术》 CAS CSCD 北大核心 2017年第11期144-148,共5页
空气质量指数是各个地区空气污染状况的数据表征,可用于政府对城市空气污染的控制。论文使用天津市2014年1月1日-2016年4月30日的空气质量数据和气象数据,建立一个基于IG(信息增益)和LASSO(最小绝对收缩率和选择算子)的空气质量指数混... 空气质量指数是各个地区空气污染状况的数据表征,可用于政府对城市空气污染的控制。论文使用天津市2014年1月1日-2016年4月30日的空气质量数据和气象数据,建立一个基于IG(信息增益)和LASSO(最小绝对收缩率和选择算子)的空气质量指数混合预测模型,对未来一天的空气质量指数进行预测。整体实验由预测模型选取、特征变量选取和混合预测3个部分组成。实验结果说明基于IG和LASSO的空气质量指数混合预测模型要比单独使用LASSO模型的预测准确性要好,其误差率为4.75%,并且空气质量指数混合预测模型也可以有效的减少输入变量的数量以及降低模型的复杂程度。同时,也得出天津市空气质量指数的预测准确度受PM_(10)、PM_(2.5)、NO_2和SO_(2)4种空气污染物浓度影响较大,与风向、天气现象和风力关联性不强的结论。 展开更多
关键词 空气质量指数 预测 信息增益 LASSO模型
下载PDF
基于IG-SFPA-BP的水工钢闸门安全等级识别 被引量:2
11
作者 刘畅 赵华东 吴优 《南水北调与水利科技(中英文)》 CAS 北大核心 2021年第2期409-416,共8页
针对BP神经网络对水工钢闸门安全等级识别能力差的问题,提出基于信息增益(IG)与自适应花授粉算法(SFPA)的BP神经网络模型。采用IG理论对水工钢闸门安全等级评价中的特征进行精简,降低冗余特征的影响,缩短网络模型的训练时间;利用SFPA优... 针对BP神经网络对水工钢闸门安全等级识别能力差的问题,提出基于信息增益(IG)与自适应花授粉算法(SFPA)的BP神经网络模型。采用IG理论对水工钢闸门安全等级评价中的特征进行精简,降低冗余特征的影响,缩短网络模型的训练时间;利用SFPA优化BP神经网络初始权重及阈值,提高网络模型的收敛速度,防止其陷入局部最优,提高网络模型对水工钢闸门安全等级的分类能力。根据IG-SFPA-BP、标准BP、IG-BP、IG-FPA-BP、IGPSO-BP及IG-GA-BP等神经网络模型在水工钢闸门安全等级数据集上进行多次独立运行的试验结果,从识别正确率、运行时间及模型均方误差等多个角度验证了IG-SFPA-BP网络模型对水工钢闸门安全等级识别的适用性。IG-SFPA-BP网络模型提高了神经网络模型在水工钢闸门安全评价领域的实用价值,也为类似工程提供新的模型参考。 展开更多
关键词 水工钢闸门 信息增益(ig) 自适应花授粉算法(SFPA) BP神经网络 安全评价
下载PDF
基于词频分布信息的优化IG特征选择方法 被引量:9
12
作者 刘海峰 刘守生 宋阿羚 《计算机工程与应用》 CSCD 北大核心 2017年第4期113-117,122,共6页
文本特征选择是文本分类的核心技术。针对信息增益模型的不足之处,以特征项的频数在文本中不同层面的分布为依据,分别从特征项基于文本的类内分布、基于词频的类内分布以及词频的类间分布等角度对IG模型逐步进行改进,提出了一种基于词... 文本特征选择是文本分类的核心技术。针对信息增益模型的不足之处,以特征项的频数在文本中不同层面的分布为依据,分别从特征项基于文本的类内分布、基于词频的类内分布以及词频的类间分布等角度对IG模型逐步进行改进,提出了一种基于词频分布信息的优化IG特征选择方法。随后的文本分类实验验证了提出的优化IG模型的有效性。 展开更多
关键词 信息增益 特征选择 类内分布 类间分布 文本分类
下载PDF
基于改进RIG算法的动态诊断策略生成 被引量:5
13
作者 李登 万福 +1 位作者 尹亚兰 周红波 《电子测量与仪器学报》 CSCD 2014年第2期159-163,共5页
针对TEAMS生成的静态诊断策略无法满足现场诊断的需求,在研究rollout信息启发式(RIG)算法的基础上,提出了一种基于改进RIG算法的动态诊断策略生成方法。首先通过改进原有的RIG算法使之符合动态诊断的需要;然后根据现场诊断过程中出现的... 针对TEAMS生成的静态诊断策略无法满足现场诊断的需求,在研究rollout信息启发式(RIG)算法的基础上,提出了一种基于改进RIG算法的动态诊断策略生成方法。首先通过改进原有的RIG算法使之符合动态诊断的需要;然后根据现场诊断过程中出现的各种具体情况,获取局部D矩阵;在此基础上利用改进的RIG算法选择出下一步最佳的测试,最终实现了故障诊断策略的动态生成。仿真实例表明,若直接采用基于RIG算法的静态诊断策略,需要4种类型的测试;若采用基于改进RIG算法的动态诊断策略生成方法,只需选择一步测试即能诊断出系统故障。该方法能够实现维修人员的交互式诊断,有效地提高了故障诊断策略的实用性。 展开更多
关键词 诊断策略 D矩阵 Rig算法 故障诊断
下载PDF
基于IG-SVM模型的供应链融资企业信用风险预测 被引量:13
14
作者 潘永明 王雅杰 来明昭 《南京理工大学学报》 EI CAS CSCD 北大核心 2020年第1期117-126,共10页
为了提高对供应链融资中小企业信用风险预测的精度,在通过对中小企业信用风险评价研究基础上集成机器学习算法构建了能够提高信用风险预测的组合模型。该模型采用支持向量机(Support vector machine,SVM)建立供应链中小企业信用风险分... 为了提高对供应链融资中小企业信用风险预测的精度,在通过对中小企业信用风险评价研究基础上集成机器学习算法构建了能够提高信用风险预测的组合模型。该模型采用支持向量机(Support vector machine,SVM)建立供应链中小企业信用风险分类预测模型,并引入信息增益(Information gain,IG)提取对预测结果有显著贡献的特征变量,优化模型特征输入。在与其他模型的对比实验中可知,采用IG-SVM模型预测的测试样本精确度为97.62%,比单一SVM模型精度提高8.97%。采用IG进行特征优化,能进一步提高SVM模型的预测能力。 展开更多
关键词 供应链融资 信息增益 支持向量机 信用风险 分类预测
下载PDF
基于GainRatio降维算法的流量聚类研究 被引量:2
15
作者 高锐 刘北水 +2 位作者 李丹 刘杰 尤博 《电子产品可靠性与环境试验》 2020年第S02期51-55,共5页
随着网络数据流量的快速增长,需要高效的流量分类技术来实现网络管理、流量控制和安全检测。传统基于端口和有效负载的流量分类方法准确率低,无监督学习方法往往仅采用单一的聚类算法对数据进行聚类分析,且较少研究对数据本身的处理。... 随着网络数据流量的快速增长,需要高效的流量分类技术来实现网络管理、流量控制和安全检测。传统基于端口和有效负载的流量分类方法准确率低,无监督学习方法往往仅采用单一的聚类算法对数据进行聚类分析,且较少研究对数据本身的处理。为了解决上述问题,提出了先运用GainRatio信息增益率方法对原始数据进行降维处理,再将降维后的数据进行聚类的方法。实验结果表明:提出的方法不仅有效地提高了运行效率,而且随着聚类个数的增加,也明显地提高了高准确率的收敛速度。 展开更多
关键词 机器学习 流量聚类 网络安全 维度下降 信息增益
下载PDF
IG-RS-SVM的电子商务产品质量舆情分析研究 被引量:3
16
作者 叶佳骏 冯俊 +1 位作者 任欢 周杭霞 《中国计量学院学报》 2015年第3期285-290,共6页
电子商务产品的评论信息对于电子商务产品质量舆情监测具有极大的参考价值.针对集成学习算法在高维度下分类精度降低的不足之处,提出了一种IG-RS-SVM(Information Gain-Random Subspace-Support Vector Machine)算法.以Random Subspace... 电子商务产品的评论信息对于电子商务产品质量舆情监测具有极大的参考价值.针对集成学习算法在高维度下分类精度降低的不足之处,提出了一种IG-RS-SVM(Information Gain-Random Subspace-Support Vector Machine)算法.以Random Subspace集成学习算法为基础,以支持向量机算法为基学习器.引入了信息增益特征选择算法.通过对特征空间中每个特征的信息增益值进行排序,剔除无价值的特征,降低RS集成算法生成的特征子空间的维度,从而提高了SVM分类算法的效率.实验结果表明,改进后算法可以有效提高评论内容的分类精度. 展开更多
关键词 产品评论 信息增益 随机子空间 支持向量机
下载PDF
基于分类属性IG比的多分类SVM结构评价方法 被引量:2
17
作者 李君娣 张正军 +1 位作者 庄立纯 张乃今 《计算机工程与科学》 CSCD 北大核心 2019年第4期719-726,共8页
基于二叉树结构组合的多分类SVM具有二分类SVM个数少的特点,避免了不可分、拒分区域的情形出现。针对基于二叉树结构的类别组合方法缺乏类别组合具体评价标准的问题,提出了基于分类属性信息增益IG比的多分类SVM结构评价方法,定义了基于... 基于二叉树结构组合的多分类SVM具有二分类SVM个数少的特点,避免了不可分、拒分区域的情形出现。针对基于二叉树结构的类别组合方法缺乏类别组合具体评价标准的问题,提出了基于分类属性信息增益IG比的多分类SVM结构评价方法,定义了基于分类属性的IG比,将多类划分成左、右两个类别组合,计算每种可能组合依赖于变量的分类属性IG比,以其最大值作为该组合优劣的衡量标准。使用UCI数据库的数据集对该方法进行实证分析,结果表明,评价指标值取最大值时,其对应类别组合构成的多分类SVM具有较高的识别率。 展开更多
关键词 二叉树 多分类 支持向量机 信息增益比 分类属性
下载PDF
基于IG-CPSO-BP的水工钢闸门安全等级识别 被引量:2
18
作者 周伦钢 赵松波 +1 位作者 仝戈 许亮 《人民黄河》 CAS 北大核心 2023年第7期130-133,162,共5页
为提高BP神经网络对水工钢闸门安全等级识别的速度和精度,构建基于信息增益(IG)和混沌粒子群优化(CPSO)算法优化BP神经网络的水工钢闸门安全等级评估模型。该模型利用IG算法精简水工钢闸门安全等级评估的特征指标,避免冗余变量干扰,提... 为提高BP神经网络对水工钢闸门安全等级识别的速度和精度,构建基于信息增益(IG)和混沌粒子群优化(CPSO)算法优化BP神经网络的水工钢闸门安全等级评估模型。该模型利用IG算法精简水工钢闸门安全等级评估的特征指标,避免冗余变量干扰,提升模型的训练速度;利用CPSO算法优化BP神经网络的初始权重,提高模型的收敛性及对水工钢闸门安全等级的分类能力。经过验证分析,基于IG-CPSO-BP的水工钢闸门安全等级评估模型的评估结果与实际的水工钢闸门安全等级基本吻合,识别精度明显优于IG-BP、IG-GA-BP、IG-PSO-BP模型。 展开更多
关键词 信息增益 混沌粒子群优化算法 BP神经网络 安全等级识别 水工钢闸门
下载PDF
Attribute Weighted Naïve Bayes Classifier 被引量:1
19
作者 Lee-Kien Foo Sook-Ling Chua Neveen Ibrahim 《Computers, Materials & Continua》 SCIE EI 2022年第4期1945-1957,共13页
The naïve Bayes classifier is one of the commonly used data mining methods for classification.Despite its simplicity,naïve Bayes is effective and computationally efficient.Although the strong attribute indep... The naïve Bayes classifier is one of the commonly used data mining methods for classification.Despite its simplicity,naïve Bayes is effective and computationally efficient.Although the strong attribute independence assumption in the naïve Bayes classifier makes it a tractable method for learning,this assumption may not hold in real-world applications.Many enhancements to the basic algorithm have been proposed in order to alleviate the violation of attribute independence assumption.While these methods improve the classification performance,they do not necessarily retain the mathematical structure of the naïve Bayes model and some at the expense of computational time.One approach to reduce the naïvetéof the classifier is to incorporate attribute weights in the conditional probability.In this paper,we proposed a method to incorporate attribute weights to naïve Bayes.To evaluate the performance of our method,we used the public benchmark datasets.We compared our method with the standard naïve Bayes and baseline attribute weighting methods.Experimental results show that our method to incorporate attribute weights improves the classification performance compared to both standard naïve Bayes and baseline attribute weighting methods in terms of classification accuracy and F1,especially when the independence assumption is strongly violated,which was validated using the Chi-square test of independence. 展开更多
关键词 Attribute weighting naïve Bayes Kullback-Leibler information gain CLASSIFICATION
下载PDF
基于BIG-WFCHI的微博信息关键特征选择方法
20
作者 殷仕刚 安洋 +1 位作者 蔡欣华 屈小娥 《计算机系统应用》 2021年第2期188-193,共6页
特征选择是用机器学习方法提高转发预测精度和效率的关键步骤,其前提是特征提取.目前,特征选择中常用的方法有信息增益(Information Gain,IG)、互信息和卡方检验(CHI-square test,CHI)等,传统特征选择方法中出现低频词引起的信息增益和... 特征选择是用机器学习方法提高转发预测精度和效率的关键步骤,其前提是特征提取.目前,特征选择中常用的方法有信息增益(Information Gain,IG)、互信息和卡方检验(CHI-square test,CHI)等,传统特征选择方法中出现低频词引起的信息增益和卡方检验的负相关、干扰计算等问题,导致分类准确率不高.本文首先针对低频词引起的信息增益和卡方检验的负相关、干扰计算等问题进行研究,分别引入平衡因子和词频因子来提高算法的准确率;其次,根据微博信息传播的特点,结合改进的IG算法和CHI算法,提出了一种基于BIG-WFCHI(Balance Information Gain-Word Frequency CHI-square test)的特征选择方法.实验分析中,本文采用基于最大熵模型、支持向量机、朴素贝叶斯分类器、KNN和多层感知器5种分类器对两个异构数据集进行了测试.实验结果表明,本文提出的方法能有效消除无关特征和冗余特征,提高分类精度,并减少运算时间. 展开更多
关键词 微博信息 特征选择 机器学习 信息增益 卡方检验
下载PDF
上一页 1 2 40 下一页 到第
使用帮助 返回顶部