期刊文献+

文本分类中词语权重计算的改进 被引量:2

Improvement to Weighting Terms in Text Classification
下载PDF
导出
摘要 文本的形式化表示一直是文本检索、自动文摘和搜索引擎等信息检索领域关注的基础性问题。向量空间模型(Vector Space Model)中的TF.1DF文本表示是该领域中得到广泛应用并且取得较好效果的一种文本表示方法。词语在文本集合中的类别分布比例量上的差异是决定词语表达文本内容的重要因素之一。但现在的TEIDF方法无法把握这一因素,针对这一缺点,将信息增益公式引入文本集合中并提出TEIDFIG文本表示方法,并比较分析了其相较于传统TF.IDF公式的优点,用实验验证了其可行性和有效性。 The formalization of text is always a fundamental issue in the area of information retrieval, such as text retrieval, automatic abstract, search engine etc. The TF.IDF text representation in Vector Space Model is an efficiency and widespread used method in this area. The difference in categorical distribution proportion in text aggregate of words is one of the key factors which determine the content of words. But the present TF.IDF method cannot handle this factor. For this shortcoming, this article introduces the text information gain for- mula to text aggregate and proposes the TEIDEIG text representation method, compares and analysis its advantages to the traditional TF. IDF formula, verifies the feasibility and validity with experiments.
作者 张青 熊前兴 ZHANG Qing, XIONG Qian xing (Department of Computer Science and Technology,Wuhan University of Technology, Wuhan 430063, China)
出处 《电脑知识与技术》 2011年第1期204-206,共3页 Computer Knowledge and Technology
关键词 文本表示 向量空间模型 词语权重 信息增益 text representation vector space model weight of words information gain
  • 相关文献

参考文献4

二级参考文献14

共引文献201

同被引文献19

  • 1XU Junling,XU Baowen,ZHANG Weifeng,CUI Zifeng,ZHANG Wei.A New Feature Selection Method for Text Clustering[J].Wuhan University Journal of Natural Sciences,2007,12(5):912-916. 被引量:3
  • 2搜狗实验室.文本分类语料库[EB/OL].[2008-07-20].http://www.sogou.com/labs/dl/c.html.
  • 3胡学钢,董学春,谢飞.基于词向量空间模型的中文文本分类方法[J].合肥工业大学学报(自然科学版),2007,30(10):1261-1264. 被引量:15
  • 4Ni Xingliang, Quan Xiaojun, Lu Zhi, et al. Short Text Clustering by Finding Core Terms [ J ]. Knowledge and Information Systems ,2011,27 ( 3 ) :345-365.
  • 5Kalogeratos A, Likas A. Text Document Clustering Using Global Term Context Vectors[ J]. Knowledge and Information Systems ,2012,31 ( 3 ) :455-474.
  • 6Cheng Xin, Miao Duoqian, Wang Can, et al. Coupled Term- term Relation Analysis for Document Clustering [ C ]//Proceedings of Neural Networks International Joint Conference on Artificial Intelligence. Washington D. C., USA :IEEE Press ,2013 : 1-8.
  • 7Liu Wenyin, Quan Xiaojun, Min Feng, et al. A Short Text Modeling Method Combining Semantic and Statisti- cal Information [J]. Information Sciences ,2010,180 ( 20 ) : 4031-4041.
  • 8Bouras C,Tsogkas V. A Clustering Technique for News Articles Using WordNet[ J]. Knowledge-based Systems, 2012,10(2) :115-128.
  • 9Hu Jian, Fang Lujun, Cao Yang, et al. Enhancing Text Clustering by Leveraging Wikipedia Semantics [ C ]// Proceedings of the 31st Annual International Special Interest Group on Information Retrieva Conference on Research and Development in Information Retrieval. New York, USA : ACM Press, 2008 : 179-186.
  • 10Bollegala D, Matsuo Y, Measuring Semantic Similarity Between Words Using Web Search Engines [ C ]// Proceedings of Workshop on Social and Collaborative Construction of Structured Knowledge at the 16th International World Wide Web Conference. New York, USA ACM Press, 2007 : 757-786.

引证文献2

二级引证文献7

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部