期刊文献+

基于语义相似度的文本表示降维方法 被引量:4

Dimension Reduction for Text Expression Based on Semantic Similarity
下载PDF
导出
摘要 数据降维是文本表示中不可或缺的一个环节,有效的数据降维方法不仅能够减少计算量,同时有助于文本处理精度的提高。不同于传统的利用统计信息进行降维的方法,本文提出了一种基于词汇的语义相似度的文本表示的降维方法,该方法结合自然语言处理的知识,在降维环节考虑了特征词的语义信息和词性信息。实验结果表明:该方法能够有效地降低文本表示的维数,并在降维后的空间获得较高的文本处理精度,基于语义相似度的降维方法是一种适合文本处理的降维方法。 Data dimension reduction plays an important role in the field of text expression.An effective dimension reduction method can not only reduce the amount of calculation,but help to improve the accuracy of text classification.The paper presents a new method of dimension reduction which is based on word semantic similarity.Being different from the traditional method which usually uses the statistical information of word,natural language processing knowledge is used in our method which considers semantic information and POS information of feature terms.The experimental result shows that the method is effective in dimensionality reduction of text expression and achieves a higher accuracy.The method based on semantic similarity is a suitable method.
出处 《河南科技大学学报(自然科学版)》 CAS 2008年第5期36-39,共4页 Journal of Henan University of Science And Technology:Natural Science
基金 河南省教育厅基金项目(200510464031)
关键词 语义相似度 知网 特征选取 Semantic similarity Hownet Feature selection
  • 相关文献

参考文献12

  • 1Yi ming Yang, Xin Liu. A Re-examination of Text Categorization Methods [ C ]//SIGIR' 99. 1999:42 - 49.
  • 2Salton G,Wong A,Yang C S. A Vector Space Model for Automatic Indexing [ C ]//Communications of the ACM. 1975,18 (5) :613 -620.
  • 3Yiming Yang,Jan O,Pedersen. A Comparative Study on Feature Selection in Text Categorization[ C]//Proceedings of ICML. 1997:412 - 420.
  • 4徐燕,王斌,李锦涛,孙春明.知识增益:文本分类中一种新的特征选择方法[J].中文信息学报,2008,22(1):44-50. 被引量:6
  • 5尚文倩,黄厚宽,刘玉玲,林永民,瞿有利,董红斌.文本分类中基于基尼指数的特征选择算法研究[J].计算机研究与发展,2006,43(10):1688-1694. 被引量:38
  • 6唐歆瑜,乐文忠,李志成,李军义.基于知网语义相似度计算的特征降维方法研究[J].科学技术与工程,2006,6(21):3442-3446. 被引量:16
  • 7Courtney Courley ,Rada Mihalcea. Measuring the Semantic Similarity of Texts[ C ]//Proceedings of the ACL Workshop on Empirical Modeling of Semantic Equivalence and Entailment. 2005 : 13 - 18.
  • 8董振东.知网[EB/OL].http://www.keenage.com/html/c-index.html.
  • 9Miller G, Beckwith R, Felbaum C. Introduction to Wordnet : An Online Lexical Database[ Z ]. 1993.
  • 10刘群,李素建.基于《知网》的词汇语义相似度的计算[C].台北:第三届汉语词汇语义学研讨会,2002.

二级参考文献52

  • 1李荣陆,王建会,陈晓云,陶晓鹏,胡运发.使用最大熵模型进行中文文本分类[J].计算机研究与发展,2005,42(1):94-101. 被引量:95
  • 2陈涛,谢阳群.文本分类中的特征降维方法综述[J].情报学报,2005,24(6):690-695. 被引量:79
  • 3尚文倩,黄厚宽,刘玉玲,林永民,瞿有利,董红斌.文本分类中基于基尼指数的特征选择算法研究[J].计算机研究与发展,2006,43(10):1688-1694. 被引量:38
  • 4冯是聪 单松巍 张志刚 等.一个中文网页数据集及其分类体系[A]..海峡两岸技术交流会[C].南京,2002-10.121-129.
  • 5黄昌宁 等.对自动分词的反思[A]..语言计算与基于内容的文本处理[C].北京:清华大学出版社,2003,7.26-38.
  • 6[4]Calvo R A.,Partridge M.A comparative study of principal component analysis techniques.In:Proc Ninth Australian Conf On Neural Networks,Brisbane,QLD,1998
  • 7[5]Deerwester S,Dumais S T,Furnas G W,et al.Indexing by latent semantic analysis.Journal of the american Society for Information Science,1990 ;41 (6):391-407
  • 8[7]Yang Y,Liu X.Are-examination of text categorization methods.Proc of theACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'99),Berkeley,1999:42-49
  • 9Yiming Yang,Jan O Pedersen.A comparative Study on Feature Selection in Text Categorization[C].In :Proceedings of the Fourteenth International Conference on Machine Leaming(ICML'97), 1997.
  • 10Yiming Yang,Xin Liu.A re-examination of text categorization methods[C].In:Proceedings of ACM SIGIR Conference on Research and Development in Information Retrieval SIGIR'99,1999:42---49.

共引文献387

同被引文献29

  • 1李艳灵,李刚.基于文本数据的数据挖掘算法研究[J].新乡师范高等专科学校学报,2003,17(2):35-37. 被引量:1
  • 2刘伟娜,霍利民,张立国.贝叶斯网络精确推理算法的研究[J].微计算机信息,2006,22(03X):92-94. 被引量:33
  • 3贾焰,王永恒,杨树强.基于本体论的文本挖掘技术综述[J].计算机应用,2006,26(9):2013-2015. 被引量:17
  • 4赵鹏,耿焕同,蔡庆生.一种基于语义和统计特征的中文文本特征表示方法[J].小型微型计算机系统,2007,28(7):1311-1313. 被引量:8
  • 5Zhang C Q,Zhang S C. Association Rule Mining Models and Algorithms [ M ]. [ S. l. ] Springer-Verlag Berlin Heidelberg, 2002.
  • 6Joachims T. Text Categorization with Support Vector Machines: Learn-ing with Many Relevant Features [ C ]//European Conference on Machine Learning, 1995.
  • 7Chulani S, Boehm B, Bayesian Analysis of Empirical Software Engineering Cost Models[ J ]. IEEE Transaction on Software Engineering, 1991 , 25 ( 4 ) : 254 - 257.
  • 8Feldman R, Hirsh R. Finding Associations in Collections of Text, Machine Learning and Data Mining: Methods and Applications[ M ]. [ S. l.] John Wiley Sons, 1998:223 - 240.
  • 9YU LEI, LIU HUAN. Efficient feature selection via analysis of relevance and redundancy [ J]. Journal of Machine Learning Research, 2004, 5:1205 - 1224.
  • 10YANG YIMING, PEDERSEN J O. A comparative study on feature selection in text categorization [ C]// ICML '97: Proceedings of the Fourteenth International Conference on Machine Learning. San Francisco, CA: Morgan Kaufmann Publishers, 1997:412 -420.

引证文献4

二级引证文献8

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部