期刊文献+

复杂网络理论在中文文本特征选择中的应用研究 被引量:17

Study on the Application of Complex Network Theory in Chinese Text Feature Selection
原文传递
导出
摘要 提出一种基于复杂网络的特征选择方法,通过构建文本加权复杂网络来表示词语间的语义关系及结构信息,综合考虑节点加权度、加权聚集系数、节点介数计算节点特性,利用节点综合特性提取反映文本主题的关键词作为文本的特征词。给出基于复杂网络的中文文本特征选择算法,并对其进行实验验证。结果表明,该特征选择方法较传统方法在文本分类性能上有所提高。 This paper proposes a feature selection method based on complex network. The weighted text is built to represent the semantic relations between words and text structure. The weighted degree, coefficient and betweenness are considered in the characteristics calculation of network nodes, the k complex network of weighted clustering ey words which can reflect the theme of the text are selected by the synthetic characteristics of network nodes. A Chinese text feature selection algorithm based on complex network is proposed and verified. The results of experiments show that the method proposed in this paper can get a better effect on the performance of text classification.
出处 《现代图书情报技术》 CSSCI 北大核心 2012年第9期23-28,共6页 New Technology of Library and Information Service
关键词 复杂网络 语义相关关系 节点综合特性 特征选择 Complex network Semantic relevance relation Synthetic characteristics of nodes Feature selection
  • 相关文献

参考文献18

  • 1John G H, Kohavi R, Pfleger K. Irrelevant Features and the Subset Selection Problem [ C ]. In : Proceedings of the 11 th International Conference on Machine Learning( ICML' 94). 1994 : 121 - 129.
  • 2Quinlan J R. Induction of Decision Trees [J]. Machine Learning, 1986, 1(1) :81 -106.
  • 3Church K W, Hanks P. Word Association Norms, Mutual Informa- tion and Lexicography [ J ]. Computational Linguistics, 1990, 16 (1) :22 -29.
  • 4Koller D, Sahami M. Hierarchically Classifying Documents Using Very Few Words [ C ]. In: Proceedings of the 14th International Conference on Machine Learning( ICML' 97 ). San Francisco : Mor- gan Kaufmann Publishers Inc. , 1997:170-178.
  • 5Kononenko I. On Biases in Estimating Multi - valued Attributes [ C ]. In : Proceedings of the 14th International Joint Conference on Artificial Intelligence ( IJCAI' 95 ). San Francisco: Morgan Kauf- mann Publishers Inc. , 1995:1034 - 1040.
  • 6Rijsbergen C V. The Selection of Good Search Terms [ J ]. lnfornu. tion Processing & Management, 1981, 17(2) :77 -91.
  • 7Huang C, Tian Y H, Huang T J, et al. Semantic Scoring Based on Small- World Phenomenon for Feature Selection in Text Mining [ C ]. In : Proceedings of the the 2nd International Conference on Ad- vanced Data Mining and Applications ( ADMA ' 06 ). Heidelberg, Berlin : Springer - Verlag, 2006:636 - 643.
  • 8Liu G, Zhai Z W. Research on Keywords Extraction of Chinese Documents Based on TEXT - NET [ C ]. In : Proceedings of the 2011 International Conference on Electric Information and Control Engi-neering. 2011:6074 - 6077.
  • 9赵鹏,蔡庆生,王清毅,耿焕同.一种基于复杂网络特征的中文文档关键词抽取算法[J].模式识别与人工智能,2007,20(6):827-831. 被引量:44
  • 10谢凤宏,张大为,黄丹,谢福鼎.基于加权复杂网络的文本关键词提取[J].系统科学与数学,2010,30(11):1592-1596. 被引量:14

二级参考文献38

  • 1韦洛霞,李勇,李伟,邵明珠,罗诗裕.汉字网络的3度分隔与小世界效应[J].科学通报,2004,49(24):2615-2616. 被引量:16
  • 2王军.词表的自动丰富——从元数据中提取关键词及其定位[J].中文信息学报,2005,19(6):36-43. 被引量:40
  • 3张敏,耿焕同,王煦法.一种利用BC方法的关键词自动提取算法研究[J].小型微型计算机系统,2007,28(1):189-192. 被引量:19
  • 4Bo Jin, Teng Hongfei, Shi Yanjun, Qu Fuzheng. Chinese patent mining based on sememe statistics and key-phrase extraction. Proc. of ADMA Conference, Harbin, 2007.
  • 5Jiao Hui, Liu Qian, Jia Huibo. Chinese keyword extraction based on N-gram and word co- occurrenc. Proc. of International Conference on Computational Intelligence and Security Workshops, Harbin, 2007.
  • 6Ferreri Cancho R, Sole R V. The small world of human language. Biological Sciences, 2001, 268(1482): 2261-2265.
  • 7Lewis D D. Reuters-21578 text categorization collection [EB/OL]. http://kdd.ics.uci.edu/databases /reuters21578, 1999.
  • 8Reyhani N, Badie K, Kharrat M. A two layered case based reasoning approach to text summarization, based on summarization pattern. Systems and Information Engineering Design Symposium, Virginia, USA, 2003,47 - 50.
  • 9Mallett D, Elding J, Nascimento MA. Information-content based sentence extraction for text summarization. International Conference on Information Technology, Las Vegas, USA, 2004,214218.
  • 10Po Hu, Tingting He, Donghong Ji, Meng Wang. A study of Chinese text summarization using adaptive clustering of paragraphs. Computer and Information Technology, Wuhan, China,2004,1159- 1164.

共引文献80

同被引文献171

引证文献17

二级引证文献83

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部