期刊文献+

融入内部语义关系对文本分类的影响研究 被引量:3

Research on Effect of Adding Internal Semantic Relationship into Text Categorization
下载PDF
导出
摘要 为了在不加入外部语义知识的前提下改善向量空间模型的文本分类效果,通过挖掘语料库内部蕴含的词间关系和文本间关系,并以不同的方式融入原始的词文本矩阵,然后选择常用的SVM和KNN算法,在领域性较强的法律语料库和领域性较宽泛的新闻语料库上进行文本分类的对比实验。实验证明,加入词间关系和文本间关系通常能有效改善文本分类的效果,但是对不同的分类方法和领域特征有不同的影响,在实际应用中应该区别对待。 In order to improve the effect of text categorization on the premise of no addition of the external knowledge, this paper presented a feature matrix-based categorization framework. First, the internal knowledge of corpus is mined and added into the original word-text matrix in different ways. Two common algorithms named SVM and KNN are cho- sen for contrastive experiment of text categorization in highly territorial legal corpus and domain-wide news corpus. Experi-mental results show that it is generally helpful when adding the semantic relationships extracted from corpus in- to the original matrix, but the adding method should be chosen according to different classification methods and domain chara-cteristics.
出处 《计算机科学》 CSCD 北大核心 2016年第9期82-86,共5页 Computer Science
基金 国家自然科学基金(71271209) 北京市自然科学基金(4132067) 教育部人文社会科学青年基金(11YJC630268) 河北省自然科学基金项目(A2013410011)资助
关键词 向量空间模型 文本分类 语义挖掘 特征矩阵 Vector space model, Text categorization, Semantic mining, Feature matrix
  • 相关文献

参考文献24

  • 1Salton G,Yang C S. On the specification of term values in auto- matic indexing[J]. Journal of Documentation, 1973,29 (4):351- 372.
  • 2Alfred R, Anthony P, Alias S, et aL Enrichment of BOW Repre- sentation with Syntactic and Semantic Background Knowledge [M]//Soft Computing Applications and Intelligent Systems. Springer Berlin Heidelberg, 2013 : 283-292.
  • 3Hotho A, Staab S, Stumme G. Ontologies improve text docu- ment clustering[C]//Third IEEE International Conference on Data Mining, 2003 (ICDM 2003). IEEE, 2003 : 541-544.
  • 4Miller G A. WordNet: a lexical database for English[J]. Com- munications of the ACM, 1995,38 ( 11 ) : 39-41.
  • 5BIoehdorn S, Cimiano P, Hotho A. Learning ontologies to im- prove text clustering and classification[M]//From Data and In- formation Analysis to Knowledge Engineering. Springer Berlin Heidelberg, 2006 : 334-341.
  • 6Gabrilovich E, Markovitch S. Wikipedia-based semantic inter- pretation for natural language processing[J]. Journal of Artifi- cial Intelligence Research, 2009,34(2) : 443-498.
  • 7Huang A, Milne D, Frank E, et al. Clustering documents using a Wikipedia-based concept representation [ M ] // Advances in Knowledge Discovery and Data Mining. Springer Berlin Heidel- berg, 2009 : 628-636.
  • 8Cilibrasi R L, Vitanyi P M B. The google similarity distance[J]. IEEE Transactions on Knowledge and Data Engineering, 2007, 19(3) :370-383.
  • 9Deerwester S C, Dumais S T, Landauer T K, et al. Indexing by latent semantic analysis[J]. JASIS, 1990,41 (6) : 391-407.
  • 10Kontostatbis A, Pottenger W M. A framework for understan- ding Latent Semantic Indexing (LSI) performance[J]. Informa- tion Processing & Management,2006,42(1):56-73.

二级参考文献22

  • 1申红,吕宝粮,内山将夫,井佐原均.文本分类的特征提取方法比较与改进[J].计算机仿真,2006,23(3):222-224. 被引量:28
  • 2苏金树,张博锋,徐昕.基于机器学习的文本分类技术研究进展[J].软件学报,2006,17(9):1848-1859. 被引量:386
  • 3Yang Y, Liu X.A re-examination of text categorization meth- ods[C]//Proceedings of 22nd Annual International ACM SI-GIR Conference on Research and Development in Infor- mation Retrieval.New York:ACM, 1999:42-49.
  • 4Novovicova J, Malik A.Information theoretic feature selec- tion algorithms for text classification[C]//Proceedings of IEEE International Joint Conference on Neural Networks. Washington:IEEE Computer Society,2005:3272-3277.
  • 5Yang Y, Pedersen J Q.A comparative study on feature selec- tion in text categorization[C]//Proceedings of the 14th Inter- national Conference on Machine Learning.Nashville: Morgan Kaufmann Publishers, 1997:412-420.
  • 6Qiu Liqing,Zhao Ruyi,Zhou Gang,et al.An extensive em- pirical study of feature selection for text categorization[C]// Proceedings of the 7th IEEE/ACIS International Confer- ence on Computer and Information Science.Washington,DC: IEEE Computer Society, 2008 : 312-315.
  • 7Lan M,Tan C L,Su J,et al.Supervised and traditional term weighting methods for automatic text categorization[J].IEEE Trans on Pattern Anal and Machine Intel, 2009, 31 (4): 721-735.
  • 8Wasikowski M, Chen Xuewen.Combating the small sample class imbalance problem using feature selection[J].IEEE Trans on Knowledge and Data Engineering, 2010,22 (10) : 1388-1400.
  • 9Xue Gui-Rong, Xing Di-Kan, Yang Qiang, et al. Deep classification in large- scale text hierarchies/ /Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Singapore, 2008: 619-626.
  • 10Dh H, Choi y, Myaeng S. Combining global and local information for enhanced deep classification/ /Proceedings of the 25th ACM SIGAPP Symposium on Applied Computing. Sierre , Switzerland, 2010: 1760-1767.

共引文献23

同被引文献22

引证文献3

二级引证文献5

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部