期刊文献+

基于概念词的特征提取方法

Feature Extraction Method Based on Concept-word
原文传递
导出
摘要 为解决因未考虑语义关联造成的VSM描述不准确的问题,基于知网本体库计算词语间的语义相似度,采用识别完全子图的方式生成概念词列表,再用概念词替换存在密切语义关联的词语。实验表明,该方法在改进文档特征提取效果的同时也明显降低了向量空间的维度。与不经概念词处理的特征提取方法相比,该方法在分类识别率上有一定提升。 In order to solve the problem in the inaccurate description of the Vector Space Model, a feature extraction method is proposed basing on Concept-word, considering the semantic association between words. Firstly, the semantic similarity between words is calculated basing on the HowNet, From the similarity list, the complete subgraph recognition is taken to generate a list of Concept-words. Then words of closely related are replaced with the Concept-words. The effect of the document extraction is improved. The dimensions of document vector are re- duced. The results show that the accuracy of classification is improved, compared with the method without Concept-word dealt.
出处 《世界科技研究与发展》 CSCD 2012年第1期119-122,147,共5页 World Sci-Tech R&D
基金 国家科技支撑计划课题-重庆便民e站服务平台(2007BAH08B04)资助项目
关键词 概念词 知网 向量空间模型 特征提取 concept-word HowNet Vector Space Model feature extraction
  • 相关文献

参考文献7

二级参考文献32

  • 1包学超,孙强,李生红.隐性语义的SVM文本分类模型[J].信息安全与通信保密,2005,27(5):29-31. 被引量:5
  • 2王煜,张明,马力.基于词条聚合和决策树的文本分类方法[J].河北大学学报(自然科学版),2005,25(3):338-342. 被引量:4
  • 3王煜,王正欧.基于模式聚合和决策树的文本分类规则抽取[J].情报科学,2006,24(1):96-99. 被引量:3
  • 4张广成,汤璐,李生红,李强,付二社.基于粗糙集理论色情信息过滤研究与实现[J].信息安全与通信保密,2006,28(3):68-69. 被引量:3
  • 5谭松波,王月粉.中文文本分类语料库-TanCorpv1.0[EB/OL].(2007-08-29)[2008-01-20].http://www.searehforum:org.cn/tansongbo/corpus.htm.
  • 6TAN S. A novel refinement approach for text categorization [ C ]// ACM CIKM. Bremen, Germany, 2005.
  • 7BAKER L D, MCCALLUM A K. Distributional clustering of words for text classification [ C ]// ACM SIGIR 98. Melbourne, Australia, 1998: 96-103.
  • 8KOHONEN T. The self-organizing maps [ J ]. Proceedings of the IEEE, 1990, 78 (9) : 1464-1480.
  • 9DEVOLVE F, SEBASTIANI F. Supervised term weighting for automated text categorization [ C ]// Proceedings of the 2003 ACM Symposium on Applied Computing. Melbourne, USA, 2003: 784-788.
  • 10MEHTA M, AGRAWAL R, RISSANEN J. SLIQ: A fast scalable classifier for data mining[ C]//Proc 1996 Int Conf Extending DataBase Technology. Avignon, France, 1996 : 573-580.

共引文献168

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部