期刊文献+

一种基于概念层次的文本特征权重计算方法 被引量:1

An Approach for Text Feature Weighting Computation Based on Concept Hierarchy
下载PDF
导出
摘要 特征权重计算是文本表示的关键,权重计算方法的优劣直接影响文本分类和聚类的准确度。基于词形和词频统计的特征加权方法过于近似和粗糙,不能有效突出具有较强类别区分度的重要特征,难以有效区分两类特征,造成了高维稀疏问题,使文本分类性能不够理想,这是特征权重计算的主要障碍。提出一种基于概念层次的特征权重计算方法,将词空间转移为概念空间,在概念层次上引入特征支持度与类别强度两个参数对特征权重进行调整。实验表明,新的方法表现了较好的分类性能,在空间维度的压缩与计算效率上也有明显的改善。 Feature weighting computation belongs to one of key problems in text document representation. Performance of feature weighting computation directly influences precision of text classification or clustering. Morphology and term frequency statistics-based feature weighting approach may suffer from ambiguity and roughness, also be incapable of giving prominence to important features with category differentiating ability. Meanwhile, traditional approach may be faced with difficulty of distinguishing between important features and otherwise. All above issues may bring forth high dimension and sparseness, and suffer from poor performance on text classification or clustering. A new concept hierarchy-based feature weighting, which introduces feature support and categorical intensity for feature weighting adjustment, is put forward. Results from experiment indicate new method performs better than traditional one on precision, vector space dimension and computation efficiency.
作者 毛林 杨学兵
出处 《安徽工业大学学报(自然科学版)》 CAS 2008年第3期329-333,共5页 Journal of Anhui University of Technology(Natural Science)
基金 安徽省教育厅自然科学基金重点资助项目(2007kJ051A)
关键词 概念空间 特征权重 概念层次 特征支持度 类别强度 concept space feature weighting concept hierarchy feature support categorical intensity
  • 相关文献

参考文献9

  • 1Kim H J,Lee S G.A Semi-supervised Document Clustering Technique for Information and Organization[C]//Proceedings of the Ninth International Conference on Information and Knowledge Management. MeLean,Virginia,2002:159-168.
  • 2Salton G, McGill M J. Introduction to Modern Information Retrieval[z].McGraw-Hill Inc,1983:58-65.
  • 3陆玉昌,鲁明羽,李凡,周立柱.向量空间法中单词权重函数的分析和构造[J].计算机研究与发展,2002,39(10):1205-1210. 被引量:126
  • 4Joachims T.A probabilistic of the Rocchio algorithm with TFIDF for text categorization[C]//The 14th International Conference on Machine Leaming(ICML97),Nasvile,TN,USA, 1997.
  • 5Thorsten J. A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization [C]// Proceedings of the 14th International Conference on Machine Leaming(ICML97), Nashville,Tennessee,USA,1997:143-151.
  • 6罗三定,陆文彦,王浩,贾维嘉.基于概念的文本类别特征提取与文本模糊匹配[J].计算机工程与应用,2002,38(16):97-99. 被引量:22
  • 7Yang Y,Pedersen J O. A Comparative Study on Feature Selection in Text Categorization[C]//The 14th International Conference on Machine Learning, San Francisco:Morgan Kaufmann Publishers,1997:412-420.
  • 8Lang K. News Weeder:Learning to Filter Net News [C]//In Proceeding of International Conference on Machine Learning(ICML), California, 1995:331-339.
  • 9海量分词研究[EB/OL].[2007-09-27].http://www.hylanda.com/cgi-bin/download/download.asp?id=8.

二级参考文献3

共引文献144

同被引文献9

  • 1卢炎生,饶祺.一种LSH索引的自动参数调整方法[J].华中科技大学学报(自然科学版),2006,34(11):38-40. 被引量:6
  • 2Daswani N, Garcia-Molina H, Yang B. Open problems in data sharing peer-to-peer systems [C]. Heidelberg: Springer-Veda, 2003:1-15.
  • 3Li J,Loo B T, Hellerstein J,et al.On the feasibility of peer-to-peer web indexing and search[C].Berkeley:Proceedings of the 2nd International Workshop on Peer-to-Peer Systems (IPTPS), 2003: 207-215.
  • 4Reynolds P, Vahdat A.Efficient peer-to-peer keyword searching [C].Riode Janeiro,Brazil:Middleware,2003:21-40.
  • 5Indyk P. Approximate nearest neighbor algorithms for Frechet distance via product metrics[C].Barcelona:Symposium on Computational Geometry,2002:102-106.
  • 6Broder A Z,Charikar M,Frieze A M,et al.Min-wise independent permutations[J].J Comput System Sci,2000,60(3):630- 659.
  • 7Smith M K. Web ontology issue status [EB/OL] .http://www. w3.org/2001/sw/WebOnt/webont-issues.html,2003-11.
  • 8TREC: Text retrieval conference [EB/OL] .http://trec.nist.gov, 2006-05.
  • 9唐俊华,阎保平.基于LSH索引的快速图像检索[J].计算机工程与应用,2002,38(24):20-21. 被引量:6

引证文献1

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部