期刊文献+

潜伏语义分析的理论及其应用 被引量:37

The Theory of Latent Semantic Analysis and its Application
下载PDF
导出
摘要 人们为什么能够在他们所得到的稀少信息基础上获得那么多的知识?对这个柏拉图问题有各种各样的回答。潜伏语义分析(Latent Semantic Analysis, LSA)使用了奇异值分解的线性代数的方法说明减少维数有助于揭示语义的潜伏关系,本文举了两个事例来加以说明:一个是对包括了计算机人机对话和数学图论两个内容的九篇文章题目进行分析,两个原来无甚联系的词经处理后却有很高的相关(.90)。另一个是对中国学生英语失误的关系的分析,减少维数后能更好地解释五种水平不同的学习者的拼写失误、用词失误和句法结构的发展趋势。LSA在文本处理方面有广泛的应用范围。 The “Plato's problem”-- how do people know as much as they do with as little information as they get?-- also known as “the poverty of the stimulus”, 搉egative evidence? or 搕he logical problem of language acquisition?, has aroused the interest of many philosophers, psychologists, linguists, and computational scientists. Nativism is the answer provided by Chomsky, but psychologists like MacWhinney and computational linguists like Sampson offer different explanation. Quine calls the problem “the scandal of induction”, whereas Shepard maintains that a general theory of generalization and similarity is as necessary to psychology as Newton's laws are to physics. However, the acceptance of the hereditary nature of language propensity does not mean the solution of the general theory of generalization and similarity--the problem of categorization. Many models have been suggested to find a mechanism by which a set of stimuli, words, or concepts come to be treated as similar. They attempt to postulate some constraints that can narrow the solution space of the problem that is to be solved by induction. Latent semantic analysis (LSA) put forth by Landauer et al is“a high-dimensional linear associative model that embodies no human knowledge beyond its general learning mechanism, to analyze a large corpus of natural text and generate a representation that captures the similarity of words and text passages.”The model employs a statistical technique of linear algebra known as singular value decomposition (SVD). The input to LSA is a matrix {A} consisting of rows representing unitary event types by columns representing contexts in which instances of the event types appear. SVD then decomposes the matrix into three matrices: {A}={U}{w}{V}T, and reduction of dimensionality is carried out in the reconstruction of the original matrix. To illustrate the power of reduction of dimensionality, two examples are given. In the example given by Landauer, the text input is titles of nine technical articles, five about human-computer interaction, four about mathematical graph theory. LSA shows how in the two-dimensionally reconstructed matrices two words that were totally uncorrelated in the original are quite strongly correlated (r =.9) in the reconstructed approximation. The other example is the use of SVD in a preliminary study of the relationship among the errors by Chinese learners of English. Reduction of dimensionality offers a better explanation of trends of development of spelling errors, misuse of words, and syntactic construction among five different types of learners. LSA have a wide area of application in connection with text processing.
作者 桂诗春
出处 《现代外语》 CSSCI 北大核心 2003年第1期76-84,共9页 Modern Foreign Languages
关键词 柏拉图问题 相似性 归纳 潜伏语义分析 奇异值分解 Plato' problem, similarity, induction, latent semantic analysis, singular value decomposition
  • 相关文献

参考文献25

  • 1Kintsch, W., D. Steinhart, G. Stahl & LSA Research Group. 2000. Developing summarization skills through the use of LSA-Based Feedback [J].Interactive learning environments 8 (2): 87-109.
  • 2Berry, M., S. Dumais, & G. O' Brien [M]. 1994. Using linear algebra for Intelligent Information Retrieval [M]. Boston: Houghton Mifflin Company.
  • 3Carrroll, J., et al. 1971. Word Frequency Book. Houghton Mifflin Company & American Heritage Publishing Co., Inc.
  • 4Chomsky, N. 1965. Aspects of the Theory of Syntax [M]. Cambridge, MA: MIT Press.
  • 5Chomsky, N. 1986. Knowledge oflanguage: Its nature, origin, and use [M]. Westport: Greenwood Publishing Group.
  • 6Chomsky, N. 2000. New horizons in the study of language and mind [M]. Cambridge: Cambridge University Press.
  • 7Deerwester, S, S. Dumais, G. Fumas, T. Landuauer, & R. Harshman. 1990. Indexing by latent semantic analysis [J]. Journal ofthe American Society for Information Science 41: 391-407.
  • 8Dumais,S.et al. 1982. Using semantic analysis to improve access to textual information [J]. Machine Studies 17: 87-107.
  • 9Foltz, P. W., W. Kintsch & T. K. Landauer. 1993 (Jan). An analysis of textual coherence using Latent Semantic Indexing [A]. Paper presented at the meeting of the Society for Text and Discourse, Jackson, W Y.
  • 10Geoffrey Sampson. 2001. Empirical Linguistics [M]. London: Continuum.

同被引文献473

引证文献37

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部