期刊文献+

基于文本集密度的特征选择与权重计算方案 被引量:8

Feature Selection and Weighting Scheme Based on Text Set Density
下载PDF
导出
摘要 在信息检索的向量空间模型中 ,文本被形式化表示为由词语权重组成的向量。因此如何让这种向量尽量准确的有效的表示出文本内容一直是该模型中的基础性问题。在这篇论文中 ,我们提出了一种基于文本集密度的特征词选择与权重计算方案的方法。它是一种使用词对文本集密度的贡献衡量该词的价值的方法。使用这种方法 ,我们能找出不损失文本有效信息的最小特征词语集 ,并且创造出更为合理权重计算方案。在文中还用了一种新的衡量权重好坏的标准———元打分法 。 In vector space model of information retrieval,a text is represented as a weighted vector which is composed of terms weighting of the text. And it is a fundamental issue to how to represent the content of a text as exactly and efficiently as possible. In this paper, we will propose a method of feature selection and weighting scheme based on text set density,which is a way of measure of contribution to the text set density about some word. By the means, we can find the set containing least elements, which can represent all valuable information of a text, and invent a more reasonable weighting scheme. And this paper presents a new measure standard of the sense of goodness of some weighting schemes: meta scoring. Through the criterion, it is proved that the approach helps.
出处 《中文信息学报》 CSCD 北大核心 2004年第1期42-47,共6页 Journal of Chinese Information Processing
基金 山东省教育厅项目 (J0 0F0 4 )
关键词 计算机应用 中文信息处理 信息检索 文本集密度 权重计算方案 元打分法 computer application Chinese information Processing information retrieval text set density weighting scheme meta scoring
  • 相关文献

参考文献6

  • 1[1]Chien Chin Chen, Meng Chang Chen,Yeali Sun. PVA: A Self-Adaptive Personal View Agent [J]. Journal of Intelligent Information Systems, 18:2/3, 173-194, 2002.
  • 2[2]Anandeep S. Pannu and Katia Sycara[J]. Learning Text Filtering Preferences.
  • 3[3]C. Burckley, A. Singhal, and M. Mitra. New retrieval approaches using SMART[C]. In: D. K, Harmann, editor, Proceedings of the Fourth Text Retrieval Conference (TREC-4), Gaithersburg,1996.
  • 4[4]S.E.Roberson and S.Walker,Okapi/ Keenbow at TREC8[C]. In: E.M. Voorhees and D.K.Harmann, editor,Proceedings of the Eighth Text Retrieval Conference(TREC-8),Gaithershurg,2000.
  • 5[5]Kjersti Aas and Line Eikvil. Text Categorization : A Survey,1999 [Z].
  • 6[6]Rong Jin , Christos Faloutsos and Alex G. Hauptmann Meta-scoring: Automatically Evaluating Term Weighting Schemes in IR without Precision -Recall [C]. In: Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, pages 83-89. ACM Press, 2001.

同被引文献57

引证文献8

二级引证文献58

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部