基于线性组合文本特征选择方法被引量：4

Feature selection method for text based on linear combination

下载PDF

导出

摘要常用文本分类特征选择算法主要通过某种评价函数来计算单个特征对类别的区分能力,由于仅考虑了特征和类别之间的关联性,忽略了特征与特征之间的相关性,从而导致特征集存在冗余。针对这一问题,提出了一种新的用于文本分类的特征选择算法,该算法可以帮助选出类别区分能力强、特征之间关联性弱的特征。实验证实,该算法的性能要优于传统的特征选择算法。 At present,most of the feature selection algorithm is through some kind of evaluation function to calculate the individual characteristics of the distinction between categories of capacity.For the reason that it merely having considered the relevance between characteristic and category with ignoring the relevance among characteristic themselves,this leads to the redundancy in feature set.In consideration of this problem,this article put forward a new feature selection algorithm in the use of text categorization.This algorithm helped to select the characteristics with strong ability to distinguish category and weak relevance among characteristics.The experimental proves that this method has better performance than the traditional feature selection algorithm.

作者邱云飞王建坤李雪邵良杉

机构地区辽宁工程技术大学软件学院

出处《计算机应用研究》 CSCD 北大核心 2011年第6期2099-2101,共3页 Application Research of Computers

基金国家自然科学基金资助项目(70971059) 辽宁省创新团队资助项目(2009T045) 辽宁省科技攻关资助项目(2007308003)

关键词文本分类特征选择模糊相关冗余性 text classification feature selection fuzzy related redundancy

分类号 TP301.6 [自动化与计算机技术—计算机系统结构]

引文网络
相关文献

参考文献9

1王卫玲,初建崇,许立科.一种基于关联性的特征选择算法[J].计算机应用与软件,2009,26(8):259-261. 被引量：2
2杨彦闯,杨炳儒,张克君.基于联合提取特征的粗糙集文本分类技术研究[J].计算机应用研究,2007,24(7):97-98. 被引量：4
3刘海峰,王元元,姚泽清,张述祖.文本分类中一种混合型特征降维方法[J].计算机工程,2009,35(2):194-196. 被引量：11
4YU Lei, LIU Huan. FCBF-feature selection for high-dimensional data [C]//Proc of the 20th International Conference on Machine Learn- ing. 2003 : 856- 863.
5谭松波.高性能文本分类算法研究[D].北京:中国科学院计算技术研究所,2005.
6MAKREHCHI M,KAMEL M S. Text classification using small num- ber of features[ C]//Proc of the 4th International Conference on Ma- chine Learning and Data Mining. Berlin: Springer-Verlag,2005:580- 589.
7YANG Yi-ming, LIU Xin. A re-examination of text categorization methods [ C ]//Proc of SIGIR' 99. New York : ACM, 1999:42-49.
8ZHANG H. The optimality of naive Bayes[ C]//Proc of the 17th In- ternational FLAIRS Conference. 2004.
9YANG Yi-ming. An evaluation of statistical approaches to text categori- zation[J]. Journal of Information Retrieval,1999,1 (1/2): 67-88.

二级参考文献18

1宋枫溪,刘树海,杨静宇,夏赛飞.最大散度差分类器及其在文本分类中的应用[J].计算机工程,2005,31(5):8-10. 被引量：8
2陈彬,洪家荣,王亚东.最优特征子集选择问题[J].计算机学报,1997,20(2):133-138. 被引量：96
3Cover T M. The Best Two Independent Measurements Are Not the Two Best[J]. 1EEE Transactions on Systems, Man, and Cybernetics, 1974, 4(1): 116-117.
4Makrehchi M, Kamel M S. Text Classification Using Small Number of Features[C]//Proc. of the 4th Int'l Conf. on Machine Learning and Data Mining in Pattern Recognition. [S. l.]: IEEE Press, 2005: 580- 589.
5Jin Zhong, Yang Jingyu, Hu Zhongshan, et al. Face Recognition Based on Uncorrelated Discriminant Transformation[J]. Pattern Recognition, 2001, 34(7): 1405-1416.
6Yu L,Liu H.Feature Selection for high-dimensional data:a fast correlation-based filter solution[R].In Proceedings of the twentieth International Conference on Machine Learning,2003:856-863.
7Lei Yu,Huan Liu.Efficient Feature Selection via Analysis of Relevance and Redundancy[J].Journal of Machine Research,2004(5):1205-1224.
8Guyon I,Elisseeff A.An introduction to variable and feature selection[J].Journal of Machine Learning Research,2003(3):1157-1182.
9Yi Wang,XiaoJing Wang.A New Approach to Feature Selection in Text Classification[R].Proceeding of the Fourth International Conference on Machine Learning and Cybernetics,Guangzhou,2005:18-21.
10Fengxi Song,Shuhai Liu.A Comparative Study on Text Representation Schemes in Text Categorization[J].Pattern Anal Applic,2007.

<12 >

共引文献15

1刘海峰,赵华,刘守生.一种基于位置的改进中文文本特征选择[J].图书情报工作,2009,53(21):102-105. 被引量：3
2朱磊,沈焕生.一种基于数字水印和信息内容的网络信息安全技术[J].解放军理工大学学报（自然科学版）,2010,11(1):19-25. 被引量：3
3刘海峰,陈琦,刘守生,苏展.一种基于数据偏斜的改进KNN文本分类[J].微电子学与计算机,2010,27(3):51-53. 被引量：3
4孟洋,赵方.基于信息熵理论的动态规划特征选取算法[J].计算机工程与设计,2010,31(17):3879-3881. 被引量：6
5张瑜,张德贤.基于类别比例因子和类内均分度的χ^2统计改进[J].电子科技,2010,23(12):70-72. 被引量：1
6刘海峰,刘守生,张学仁.聚类模式下一种优化的K-means文本特征选择[J].计算机科学,2011,38(1):195-197. 被引量：12
7王辉,张成锁,卓呈祥.一种改进的相对熵特征选择方法[J].计算机工程,2011,37(10):167-169. 被引量：1
8周利兵.青海高原牦牛骨和牦牛角中微量元素的化学计量分析[J].西北农业学报,2011,20(5):59-63. 被引量：10
9吴敬桐,陈玉泉.童话故事概念关系的自动构建[J].计算机工程,2011,37(15):131-133.
10李建林.一种基于PCA的组合特征提取文本分类方法[J].计算机应用研究,2013,30(8):2398-2401. 被引量：24

<12 >

同被引文献42

1侯汉清 ,章成志 ,郑红 .Web概念挖掘中标引源加权方案初探[J].情报学报,2005,24(1):87-92. 被引量：32
2任效江,胡于进.利用开源框架开发可复用级系统架构[J].计算机与数字工程,2007,35(5):45-48. 被引量：3
3王圆.文本内容过滤的关键技术研究.长春:东北师范大学,2006;19-20.
4田文颖.文本特征提取方法研究.http://blog.csdn.net/tvetve/archive/2008/04/14/229211.aspx,2010-11-06/2011-10-07.
5Salton G,Lesk M E.Computer Evaluation of indexing and text pro-cessing.Journal of the ACM,1968;15(1):8-36.
6Yang Y,Pedersen J Q.A comparative study on feature selection intext categorization.Proceeding of the 14th International Conference onMachine Learning(ICML),1997;412-420.
7蒋健.文本分类中特征提取和特征加权方法研究[D].重庆:重庆大学,2010.
8oyvind Hauge, Claudia Ayala, Reidar Conradi. Adoption of open source software in softwareqntensive organizations A systemat- ic literature review[J]. Information and Software Technology, 2010,52(11):1133 1154.
9QualiPSo. http://www, qualipso, org/sites/default/files/A6. D1.6.3CMM LIKEMODELFOROSS. pdf[2011 12-06].
10NEAOSS. http://www. {sstd. org. cn/getInde:& req? action = quary&req= modulenvpromote&id = 1568&type = 0 &moduleId = 896g&sid~4312011 12-07].

<12 3 4 5 >

引证文献4

1孙凯,魏海平.一种改进的文本特征选择方法[J].科学技术与工程,2012,20(12):3002-3004.
2杨宇科,马洪江,周相兵.一种面向开源软件特征的开源软件选择方法研究[J].计算机与数字工程,2012,40(7):9-12.
3樊存佳,汪友生,王雨婷.一种改进的CHI文本特征选择方法[J].计算机与现代化,2016(11):7-11. 被引量：5
4马建红,刘广森,姚爽,杨智.面向短文本的特征选择及文本表示[J].计算机与现代化,2019(3):95-101. 被引量：1

二级引证文献6

1余本功,曹雨蒙,陈杨楠,杨颖.基于nLD-SVM-RF的短文本分类研究[J].数据分析与知识发现,2020,4(1):111-120. 被引量：10
2林智健.CHI文本分类特征选择方法的改进与实现[J].信息与电脑,2018,30(7):172-176. 被引量：1
3石磊,巴阳,陶永才,卫琳.基于MapReduce的改进CHI文本特征选择机制[J].小型微型计算机系统,2018,39(8):1799-1804. 被引量：4
4王振,邱晓晖.混合CHI和MI的改进文本特征选择方法[J].计算机技术与发展,2018,28(4):87-90. 被引量：6
5魏力,王子炫.结合标签规则的P2P网贷风控模型[J].计算机与数字工程,2020,48(7):1687-1692. 被引量：1
6刘辉,张振康,王韩林,晏飞扬.基于词频和信息熵改进的卡方特征选择[J].计算机仿真,2022,39(7):492-496.

1常小红,张明.基于RBFN的交互式图像检索方法[J].计算机技术与发展,2007,17(9):31-34. 被引量：2
2钱秋银,张正兰.一种基于多分类SVM的相关反馈图像检索方法[J].计算机技术与发展,2009,19(8):65-68. 被引量：3
3邢海燕,祝咏晨.基于模糊相关的文本特征选择方法[J].现代计算机,2008,14(12):86-88.
4雷景生.基于模糊相关的Web文档分类方法[J].计算机工程,2005,31(24):13-14. 被引量：3
5林荣,姜云飞.基于特征模糊相关的特征聚类算法[J].现代计算机,2004,10(10):6-9.
6梁艳梅,翟宏琛,母国光.基于模糊相关的彩色图像检索[J].中国科学（E辑）,2003,33(10):934-938. 被引量：6
7孟昕,周琛琛,郝志廷.运动模糊图像恢复算法相关研究发展概述[J].安徽电子信息职业技术学院学报,2008,7(6):38-41.
8许舟军,方应谦.一种模糊相关的手写体汉字特征提取方法[J].微型电脑应用,1999,15(2):31-33. 被引量：1
9张思远,翟宏琛,梁艳梅,王熠,母国光.模糊相关中的加权算法及其在彩色图像检索中的应用[J].中国科学（G辑）,2004,34(1):60-68. 被引量：2
10谢科,张辉,陈鹏,庞斌.文本分类系统关键技术[J].广西师范大学学报（自然科学版）,2007,25(2):123-126. 被引量：7

<12 >

计算机应用研究

2011年第6期

职称评审材料打包下载

基于线性组合文本特征选择方法被引量：4

参考文献9

二级参考文献18

共引文献15

同被引文献42

引证文献4

二级引证文献6

相关作者

相关机构

相关主题

基于线性组合文本特征选择方法 被引量：4

参考文献9

二级参考文献18

共引文献15

同被引文献42

引证文献4

二级引证文献6

相关作者

相关机构

相关主题

微信扫一扫：分享

基于线性组合文本特征选择方法被引量：4