期刊文献+

潜在语义索引在文本分类中的应用 被引量:3

Application of latent semantic indexing to the research of text classification
下载PDF
导出
摘要 传统的文本分类都是根据文本的外在特征进行的,最常见的就是基于向量空间模型的方法,使用空间向量表示文本,通过相似度比较来确定分类。为了克服向量空间模型中的词条独立性假设,文章提出了一种基于潜在语义索引的文本分类模型,通过对大量的文本集进行统计分析,揭示了词语的上下文使用含义,通过奇异值分解有效地降低了向量空间的维数,消除了同义词、多义词的影响,从而提高了文本分类的精度。 Because traditional text classification is based on explicit character, and the common method is to represent textual materials with space vectors using vector space model, then confirm the category of the test documents by comparing the degree of similarity. In order to overcome the hypothesis of term independence in VSM, the text classification based on latent semantic indexing was proposed. It extracts the contextual-usage meaning of words by statistical computations applied to a large corpus of text and can advance the accuracy of text classification by using a singular value decomposition (SVD) to effectively reduce the dimension of the vector space and remove the influences of synonymy and polysemy.
出处 《电脑与信息技术》 2006年第5期32-34,38,共4页 Computer and Information Technology
关键词 潜在语义索引 文本分类 奇异值分解 latent semantic indexing (LSI) text classification singular value decomposition
  • 相关文献

参考文献5

  • 1Deerwester S, Dumais S T A. Indexing by Latent Semantic Analysis[J]. Journal of the Society for Information Science,1990,41 (6) :391-407.
  • 2Landaurer T K, Foltz P W. Introduction to Latent Semantic Analysis[C]. Discourse Process,1998(25):259-284.
  • 3Wang Ming-wen,Nie Jian-Yun.A Latent Semantic Structure Model for Text Classification[A]. Workshop on Mathematioral/Formal methods in information retfieval,26th ACM-SIGIR[C] ,2003.
  • 4刘贵龙,王慧玲,宋柔.矩阵的奇异值分解在文本分类研究中的应用[J].计算机工程,2002,28(12):17-18. 被引量:14
  • 5林鸿飞,姚天顺.基于潜在语义索引的文本浏览机制[J].中文信息学报,2000,14(5):49-56. 被引量:29

二级参考文献6

  • 1[1]Berry M W,Dumais S T.Using Linear Algebra for Intelligent Information Retrieval. SIAM Review, 1995,37(4):573 -595
  • 2Yang Y,Proceedingsofthe 14thInternationalConferenceonMachineLearning,1997年
  • 3吴立德,大规模中文文本处理,1997年
  • 4姚天顺,自然语言理解,1995年
  • 5林鸿飞,战学刚,姚天顺.文本层次分析与文本浏览[J].中文信息学报,1999,13(4):7-15. 被引量:12
  • 6林鸿飞,战学刚,姚天顺.基于概念的文本结构分析方法[J].计算机研究与发展,2000,37(3):324-328. 被引量:35

共引文献41

同被引文献17

  • 1石晶,戴国忠.基于PLSA模型的文本分割[J].计算机研究与发展,2007,44(2):242-248. 被引量:25
  • 2SC Deerwester,ST Dumais,TK Landauer,et al.Indexing by Latent Semantic Analysis[].Journal of the American Society for Information Science.1990
  • 3Letsche T A,Berry M W.Large-scale information retrieval with latent semantic indexing[].Journal of Information Science.1997
  • 4Tirunillai S,Tellis G.Mining Marketing Meaning from Chatter:Strategic Brand Analysis of Big Data Using Latent Dirichlet Allocation[J].Journal of Marketing Research,2014,51(4):463-479.
  • 5Duan Jiangjiao,Zeng Jianping.Web Objectionable Text Content Detection Using Topic Modeling Technique[J].Expert Systems with Applications,2013,40(15):6094-6104.
  • 6Wiebe J,Wilson T,Cardie C.Annotating Expressions of Opinions and Emotions in Language[J].Language Resources and Evaluation,2005,39(2/3):164-210.
  • 7Deerwesler S,Dumais S T A.Indexing by Latent Semantic Analysis[J].Journal of the Society for Information Science,1990,41(6):391-407.
  • 8Ekman P,Friesen W V.The Repertoire of Nonverbal Behavior:Categories,Origins,Usage,and Coding[J].Semiotica,1969,1(1):49-98.
  • 9Brown G W,Cliff M T.Investor Sentiment and the Nearterm Stock Market[J].Journal of Empirical Finance,2004,11(1):1-27.
  • 10Liu Huan,Yu Lei.Toward Integrating Feature Selection Algorithms for Classification and Clustering[J].IEEE Transactions on Knowledge and Data Engineering,2005,17(5):491-502.

引证文献3

二级引证文献10

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部