Dimensionality Reduction by Mutual Information for Text Classification 被引量：2

Dimensionality Reduction by Mutual Information for Text Classification

下载PDF

导出

摘要 The frame of text classification system was presented. The high dimensionality in feature space for text classification was studied. The mutual information is a widely used information theoretic measure, in a descriptive way, to measure the stochastic dependency of discrete random variables. The measure method was used as a criterion to reduce high dimensionality of feature vectors in text classification on Web. Feature selections or conversions were performed by using maximum mutual information including linear and non-linear feature conversions. Entropy was used and extended to find right features commendably in pattern recognition systems. Favorable foundation would be established for text classification mining. The frame of text classification system was presented. The high dimensionality in feature space for text classification was studied. The mutual information is a widely used information theoretic measure, in a descriptive way, to measure the stochastic dependency of discrete random variables. The measure method was used as a criterion to reduce high dimensionality of feature vectors in text classification on Web. Feature selections or conversions were performed by using maximum mutual information including linear and non-linear feature conversions. Entropy was used and extended to find right features commendably in pattern recognition systems. Favorable foundation would be established for text classification mining.

作者刘丽珍宋瀚涛陆玉昌

机构地区 Information Engineering College School of Information Science and Technology Laboratory

出处《Journal of Beijing Institute of Technology》 EI CAS 2005年第1期32-36,共5页 北京理工大学学报（英文版）

基金 theNational"973"ProgramProjects(G1998030414)

关键词 text classification mutual information dimensionality reduction text classification mutual information dimensionality reduction

分类号 TP391.1 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献9

1TorkkolaK,WilliamM.Campbellmutualinformationin learningfeaturetransformations[].thInternational ConfonMachineLearning.2000
2TorkkolaK.Nonlinearfeaturetransformsusingmaxi mummutualinformation[].ConfoftheIJCNN.2001
3KarypisG,HanEH.Conceptindexing:Afastdimen sionalityreductionalgorithmwithapplicationstodocu mentretrievalandclassification[].thInt’lConfIn formationandKnowledgeManagementCIKM.2000
4HutterM.Distributionofmutualinformation[].th InternationalConferenceonNeuralInformationProcess ingSystems.2001
5YangY,PedersenJO.Acomparativestudyonfeature selectionintextclassification[].KDD SixthACM SIGKDDInternationalConferenceonKnowledgeDiscov eryandDataMining.2000
6TorkkolaK.Onfeatureextractionbymutualinformation maximization. http:∥citeseer.nj.nec.com/ torkkola02feature.htm .
7JongzhiS.Knowledgediscovery[]..2002
8GalavottiL,SebastianiF,SimiM.Featureselectionand negativeevidenceinautomatedtextclassification[].KDD SixthACMSIGKDDInternationConference onKnowledgeDiscoveryandDataMining.2000
9ZaffalonM,HutterM.Robustfeatureselectionbymu tualinformationdistributions. http:∥www. idsia.ch/～marcus/ai/feature.htm .

同被引文献17

1张猛,王大玲,于戈.一种基于自动阈值发现的文本聚类方法[J].计算机研究与发展,2004,41(10):1748-1753. 被引量：16
2刘丽珍,宋瀚涛,陆玉昌.基于二次熵的互信息特征选取方法的研究[J].计算机科学,2004,31(12):135-136. 被引量：2
3唐伟,周志华.基于Bagging的选择性聚类集成[J].软件学报,2005,16(4):496-502. 被引量：94
4尉景辉,何丕廉,孙越恒.基于K-Means的文本层次聚类算法研究[J].计算机应用,2005,25(10):2323-2324. 被引量：18
5杨占华,杨燕.一种基于SOM和K-means的文档聚类算法[J].计算机应用研究,2006,23(5):73-74. 被引量：16
6索红光,杨涛.基于互信息的Web文档聚类方法[J].广西师范大学学报（自然科学版）,2007,25(2):131-134. 被引量：3
7Estivill-Castro V.Why so many clustering algorithrns--A position paper[J].SIGKDD Explorations,2002,4(1):65-75.
8Merwe vander D W, Engelbreeht A P.Data clustering using particle swarm optimization [C].IEEE,2003:215-210.
9Zhou ZH, Wu J, Tang W. Ensembling neural networks: Many could be better than all[J].Artificial Intelligence,2002,137( 1-2): 239-263.
10Marques de Sa JP. Pattern recognition concepts, methods and application [M].Wu YF, Trans.2nd ed.Beijing:Tsinghua University Press,2002:51-74.

引证文献2

1索红光,杨涛.基于互信息的Web文档聚类方法[J].广西师范大学学报（自然科学版）,2007,25(2):131-134. 被引量：3
2李杉,张化祥.基于Bagging的聚类集成方法[J].计算机工程与设计,2010,31(1):164-166. 被引量：5

二级引证文献8

1何灵敏,潘益民.一种基于GA的聚类集成算法[J].中国计量学院学报,2011,22(3):282-285. 被引量：2
2苏云辉,张莹,白清源,谢丽聪,谢伙生.基于访问兴趣度的用户事务聚类方法[J].广西师范大学学报（自然科学版）,2007,25(4):248-251. 被引量：2
3李杉,张化祥.基于Bagging的聚类集成方法[J].计算机工程与设计,2010,31(1):164-166. 被引量：5
4林磊,孙承杰,张二艳,刘秉权.一种基于改进似然比的术语自动抽取方法[J].广西师范大学学报（自然科学版）,2010,28(1):153-156. 被引量：1
5赵玉娟,刘擎超.基于混合多距离度量的多分类器加权集成研究[J].计算机工程,2012,38(21):171-174. 被引量：5
6余广民,林金堂,姚剑敏,严群,林志贤.基于GAN网络的异常检测算法研究[J].广播电视网络,2020,27(4):101-107. 被引量：2
7王岩,王聪英,申艳梅.改进的蜂群优化聚类集成联合相似度推荐算法[J].计算机工程,2020,46(10):88-94. 被引量：4
8张永峰,杜方键,张志正.两种SVM 集成水下目标识别方法的比较[J].电声技术,2020,44(8):33-36. 被引量：1

1Lishan QIAO,Limei ZHANG,Songcan CHEN.Dimensionality reduction with adaptive graph[J].Frontiers of Computer Science,2013,7(5):745-753. 被引量：1
2Shilin Zhang Mei Gu.Using Improved Text Classification Technique to Acquire Job Opportunities for Disabled Persons[J].通讯和计算机（中英文版）,2010,7(3):44-49.
3Xin Qin,Nian Yongjian,Li Xiu,Wan Jianwei,Su Linghua.DIMENSIONALITY REDUCTION FOR HYPERSPECTRAL IMAGERY BASED ON FASTICA[J].Journal of Electronics(China),2009,26(6):831-835. 被引量：4
4ZHONG Jiang,SUN Qigan,LI Xue,WEN Luosheng.A Novel Feature Selection Method Based on Probability Latent Semantic Analysis for Chinese Text Classification[J].Chinese Journal of Electronics,2011,20(2):228-232. 被引量：11
5Huazhen Gu Kuanjiu Zhou.Text Classification Based on Domain Ontology[J].通讯和计算机（中英文版）,2006,3(5):29-32. 被引量：5
6SuGui-yang LiJian-hua MaYing-hua LiSheng-hong YinZhong-hang.Concept Association and Hierarchical Hamming Clustering Model in Text Classification[J].Wuhan University Journal of Natural Sciences,2004,9(3):339-342.
7王飞,李彩虹,王景山,徐娇,李廉.A Two-Stage Feature Selection Method for Text Categorization by Using Category Correlation Degree and Latent Semantic Indexing[J].Journal of Shanghai Jiaotong university(Science),2015,20(1):44-50. 被引量：2
8刘石竹,胡和平.Text Classification Using Sentential Frequent Itemsets[J].Journal of Computer Science & Technology,2007,22(2):334-336.
9Rong ZHU,Min YAO.Image feature optimization based on nonlinear dimensionality reduction[J].Journal of Zhejiang University-Science A(Applied Physics & Engineering),2009,10(12):1720-1737. 被引量：3
10Yanyan ZHANG,Jianchun ZHANG,Zhisong PAN,Daoqiang ZHANG.Multi-view dimensionality reduction via canonical random correlation analysis[J].Frontiers of Computer Science,2016,10(5):856-869. 被引量：3

Journal of Beijing Institute of Technology

2005年第1期

浏览历史

内容加载中请稍等...

Dimensionality Reduction by Mutual Information for Text Classification 被引量：2

参考文献9

同被引文献17

引证文献2

二级引证文献8

相关作者

相关机构

相关主题

浏览历史