期刊文献+

基于特征词权重的文本分类 被引量:1

Text Classification Based on Weight of Feature Words
下载PDF
导出
摘要 在文本分类时,只有少数学者利用特征词权重对文本进行向量表示,但是所使用的特征选择算法没有考虑特征词权重的正负及其范围等。因此,本文在CHI统计基础上提出一种计算特征词类相关性的新方法,并根据各类特征集中包含的特征词的数量,选用不同的文本类相关性计算方法;在判定文本类别过程中,只使用文本包含的特征词的个数及其类相关性,对含特征词少的文本也能很好判别。实验表明,该方法有效可行。 In text classification,only a few scholars used the weight of feature words to express text,but the method of feature selection they used didn't consider the symbol and boundary of the weight of feature words.So,on the basis of CHI statistics,this paper proposes a new way to calculate correlation-score between feature words and classification;and selects different means to get the relevance between text and classification,according to the count of feature words in each feature set.At last,in order to determine the text category,this paper just applies the number of feature words and their relevance to category,and can well judge the text contained few feature words.Experiment shows that it is an effective and feasible method to classify text.
出处 《计算机与现代化》 2012年第10期8-13,共6页 Computer and Modernization
基金 国家自然科学基金资助项目(61173146) 国家社会科学基金资助项目(12CTQ042) 江西省自然科学基金资助项目(2010GZS0067) 江西省教育厅科技重点项目(GJJ09650)
关键词 文本分类 特征选择 特征词类相关性 文本类相关性 text classification feature selection correlation-score between feature words and classification correlation-score between text and classification
  • 相关文献

参考文献14

  • 1冯书晓,徐新,杨春梅.国内中文分词技术研究新进展[J].情报杂志,2002,21(11):29-30. 被引量:25
  • 2Luhn H P. Auto-encoding of Documents for InformationRetrieval Systems[ M]//Modem Trends in Documentation. London: Pergamon Press, 1959:45-58.
  • 3Peng F, Schuurmans D. Combinning nai've bayes and n- Gram language models for text classification [ C ]//Lecture Notes in Computer Science, 2003,2633:335-350.
  • 4Wei Zhihua, Miao Duoqian, Chauchat J H, et al. Feature selection on Chinese text classification using character N- Grams[C ]//Proe. of the 3rd International Conference on Rough Sets and Knowledge Technology. Chengdu, China, 2008:500-507.
  • 5Liu Rui, Jiang Minghu. Chinese text classification based on the BVB model [ C ]//Proc. of the 4th International Conference on Semantics, Knowledge and Grid. Washing- ton DC, USA, 2008:376-379.
  • 6Ikonomakis M, Kotsiantis S, Tampakas V. Text classifica- tion: A recent overview[ C]//Proc. of the 9th World Sci- entific and Engineering Academy and Society International Conference on Computers. Athens, Greece, 2005 : 1-6.
  • 7Zhang Wen, Yoshida T, Tang Xijin. Text classification based on multi-word with support vector machine [ J ]. Knowledge-Based Systems, 2008,21 ( 8 ) : 879-886.
  • 8Baker L D, MeCallum A K. Distributional clustering of words for text classification[ C]//Proc. of the 21st Annual International ACM SIGIR Conference on Research and De- velopment in Information Retrieval. Melbourne, Australia, 1998:96-103.
  • 9Batal I, Hauskrecht M. Boosting KNN text classification accuracy by using supervised term weighting schemes [C]//Proc. of the 18th ACM Conference on Information and Knowledge Management. Hong Kong, China, 2009: 2041-2044.
  • 10Ko Y, Seo J. Text classification from unlabeled documents with bootstrapping and feature projection techniques [ J ]. Information Processing and Management, 2009, 45 ( 1 ) : 70-83.

二级参考文献18

共引文献24

同被引文献14

  • 1苏金树,张博锋,徐昕.基于机器学习的文本分类技术研究进展[J].软件学报,2006,17(9):1848-1859. 被引量:381
  • 2LU Shinghua, CHIANG Ding' an, KEH Huanchao, et al. Chi- nese text classification by the Nave Bayes classifier and theasso- ciative classifier with multiple confidence threshold values [ J ]. Knowledge-Based Systems, 2010,23 ( 6 ) : 598 -604.
  • 3XU Qinan, LIU Zhijing. Automatic Chinese text classification based on NSVMDT-KNN [ C ] //Prec. of the 5th International Conference on Fuzzy Systems and Knowledge Discovery, Shan- dong, China, 2008: 410-414.
  • 4LIU Reylong. Dynamic category profiling for text filtering and classification [ J ]. Information Processing & Management, 2007, 43 (1) : 154-168.
  • 5WANG J H, XU Y, YOU J. Sparse residue for occluded faceimage reconstruction and classification [ C ]. Pattern Recongni- tion (ICPR), 2012 21st International Conference, 2012, 11 : 1707-1710.
  • 6YIN Jun, LIU Zhonghua, et al. Kernel spare representation based classification [J]. Neuro computing, 2012, 77 (22) : 120-128.
  • 7HUANG J S, ZHENG C H. Independent component analysis- based penalized discriminant method for tumor classification u- sing gene expression data [J]. Bioinformatics, 2006, 22 (15) : 1855-1862.
  • 8WRIGHT J, et al. Robust face recognition via sparse represen- tation [ J ]. IEEE Transations on Pattern Analysis and Machine Intelligence, 2009, 31 (2): 210-227.
  • 9YANG Meng, ZHANG Lei, YANG Jian, ZHANG David. Ro- bust sparse coding for face recognition [ C ]. IEEE Computer Society Conference on Computer Vision and Pattern Recogni- tion, Colorado Springs, 2011: 625-632.
  • 10杨林波,王士同.基于边界可信度相似的快速文本分类方法[J].计算机工程与应用,2009,45(4):156-158. 被引量:3

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部