期刊文献+

基于句类向量空间模型的自动文本分类研究 被引量:6

Research of Automatic Text Categorization Based on Sentence Category VSM
下载PDF
导出
摘要 向量空间模型是自动文本分类中成熟的文本表示模型,通常以词语或短语作为特征项,但这些特征项通常只能提供较少的局部语义信息。为实现基于内容的文本分类,该文用HNC理论中的句类作为特征项,通过混合句类分解等技术对句类向量空间降维,使用tfc算法对特征项进行权重计算,用KNN算法进行分类。该分类器的平均准确率和召回率都是可接受的,对类别的抽象程度无要求,即抽象度较高和较低的类别可以同时分类。通过使用更好的机器学习算法和其他的HNC语言理解技术,性能可以进一步提高。 Vector space model is a mature model of text representation in automatic text categorization. Words and phrases are commonly used as feature items, but these items provide little local semantic information. This paper uses sentence categories, which include more semantic information, as feature items. To reduce the dimensionality of sentence category vector space, it divides mixed sentence categories and reformes the weights by tfc algorithm and buildsthe classifier by KNN algorithm. The average precision and recall of the classifier are acceptable, especially in the case of categories having different abstraction. The performance can be improved by HNC techniques and machine learning algorithm.
作者 张运良 张全
出处 《计算机工程》 CAS CSCD 北大核心 2007年第22期45-47,共3页 Computer Engineering
基金 国家"973"计划基金资助项目"自然语言理解的交互引擎研究"(2004CB318104) 中科院声学所知识创新工程资助项目
关键词 文本分类 句类 向量空间模型 HNC理论 text classification sentence category vector space model (VSM) HNC theory
  • 相关文献

参考文献10

二级参考文献39

  • 1黄萱青 吴立德.独立于语种的文本分类方法[M].,2000.37-43.
  • 2鲁松 白硕 等.文本中词语权重计算方法的改进[M].,2000.31-36.
  • 3卜东波.聚类/分类理论研究及其在大模型文本挖掘的应用:博士论文[M].,2000..
  • 4Shankar S,Karypis G.Weight adjustment schemes for a centroid based classifier[R].Computer Science Technical Report TR00-035 ,Department of Computer Science,University of Minnesota,Minneapolis,Minnesota, 2000.
  • 5Yang Y.An Evaluation of Statistical Approaches to Text Category[J]. Journal of Information Retrieval, 1999 ; 1 (1/2) :67-88.
  • 6Cairo R A,Partridge M.A Comparative Study of Principal Component Analysis Techniques[C].In:Proe Ninth Australian Conf On Neural Networks, Brisbane, QLD, 1998.
  • 7Deerwester S,Dumais S T,Furnas G W et al.Indexing by Latent Semantic Analysis[J].Joumal of the American Society for Information Science, 1990;41 (6) :391-407.
  • 8Dumais S T.Using LSI for information filtering:TREC-3 experiments[C]. In : Proc of the Third Text Retrieval (TREC-3), National Institute of Standards and Technoloy, 1995.
  • 9Karypis G,Han E H.Concept indexing;A fast dimensionality reduction algorithem with applications to document retrieval & categorization[R]. Technical Report TR-00-016,Department of Computer Science,University of Minnesota,Minneapolis,2000.
  • 10Yang Y,Pedersen J O.A Comparative Study on Feature Selection in Text Categorization[C].In : ICML 97,1997:412-420.

共引文献542

同被引文献71

引证文献6

二级引证文献30

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部