潜伏语义分析的理论及其应用被引量：37

The Theory of Latent Semantic Analysis and its Application

下载PDF

导出

摘要人们为什么能够在他们所得到的稀少信息基础上获得那么多的知识?对这个柏拉图问题有各种各样的回答。潜伏语义分析(Latent Semantic Analysis, LSA)使用了奇异值分解的线性代数的方法说明减少维数有助于揭示语义的潜伏关系,本文举了两个事例来加以说明:一个是对包括了计算机人机对话和数学图论两个内容的九篇文章题目进行分析,两个原来无甚联系的词经处理后却有很高的相关(.90)。另一个是对中国学生英语失误的关系的分析,减少维数后能更好地解释五种水平不同的学习者的拼写失误、用词失误和句法结构的发展趋势。LSA在文本处理方面有广泛的应用范围。 The “Plato's problem”-- how do people know as much as they do with as little information as they get?-- also known as “the poverty of the stimulus”, 搉egative evidence? or 搕he logical problem of language acquisition?, has aroused the interest of many philosophers, psychologists, linguists, and computational scientists. Nativism is the answer provided by Chomsky, but psychologists like MacWhinney and computational linguists like Sampson offer different explanation. Quine calls the problem “the scandal of induction”, whereas Shepard maintains that a general theory of generalization and similarity is as necessary to psychology as Newton's laws are to physics. However, the acceptance of the hereditary nature of language propensity does not mean the solution of the general theory of generalization and similarity--the problem of categorization. Many models have been suggested to find a mechanism by which a set of stimuli, words, or concepts come to be treated as similar. They attempt to postulate some constraints that can narrow the solution space of the problem that is to be solved by induction. Latent semantic analysis (LSA) put forth by Landauer et al is“a high-dimensional linear associative model that embodies no human knowledge beyond its general learning mechanism, to analyze a large corpus of natural text and generate a representation that captures the similarity of words and text passages.”The model employs a statistical technique of linear algebra known as singular value decomposition (SVD). The input to LSA is a matrix {A} consisting of rows representing unitary event types by columns representing contexts in which instances of the event types appear. SVD then decomposes the matrix into three matrices: {A}={U}{w}{V}T, and reduction of dimensionality is carried out in the reconstruction of the original matrix. To illustrate the power of reduction of dimensionality, two examples are given. In the example given by Landauer, the text input is titles of nine technical articles, five about human-computer interaction, four about mathematical graph theory. LSA shows how in the two-dimensionally reconstructed matrices two words that were totally uncorrelated in the original are quite strongly correlated (r =.9) in the reconstructed approximation. The other example is the use of SVD in a preliminary study of the relationship among the errors by Chinese learners of English. Reduction of dimensionality offers a better explanation of trends of development of spelling errors, misuse of words, and syntactic construction among five different types of learners. LSA have a wide area of application in connection with text processing.

作者桂诗春

机构地区广东外语外贸大学外国语言学及应用语言学研究中心

出处《现代外语》 CSSCI 北大核心 2003年第1期76-84,共9页 Modern Foreign Languages

关键词柏拉图问题相似性归纳潜伏语义分析奇异值分解 Plato' problem, similarity, induction, latent semantic analysis, singular value decomposition

分类号 H195 [语言文字—汉语]

引文网络
相关文献

参考文献25

1Kintsch, W., D. Steinhart, G. Stahl & LSA Research Group. 2000. Developing summarization skills through the use of LSA-Based Feedback [J].Interactive learning environments 8 (2): 87-109.
2Berry, M., S. Dumais, & G. O' Brien [M]. 1994. Using linear algebra for Intelligent Information Retrieval [M]. Boston: Houghton Mifflin Company.
3Carrroll, J., et al. 1971. Word Frequency Book. Houghton Mifflin Company & American Heritage Publishing Co., Inc.
4Chomsky, N. 1965. Aspects of the Theory of Syntax [M]. Cambridge, MA: MIT Press.
5Chomsky, N. 1986. Knowledge oflanguage: Its nature, origin, and use [M]. Westport: Greenwood Publishing Group.
6Chomsky, N. 2000. New horizons in the study of language and mind [M]. Cambridge: Cambridge University Press.
7Deerwester, S, S. Dumais, G. Fumas, T. Landuauer, & R. Harshman. 1990. Indexing by latent semantic analysis [J]. Journal ofthe American Society for Information Science 41: 391-407.
8Dumais,S.et al. 1982. Using semantic analysis to improve access to textual information [J]. Machine Studies 17: 87-107.
9Foltz, P. W., W. Kintsch & T. K. Landauer. 1993 (Jan). An analysis of textual coherence using Latent Semantic Indexing [A]. Paper presented at the meeting of the Society for Text and Discourse, Jackson, W Y.
10Geoffrey Sampson. 2001. Empirical Linguistics [M]. London: Continuum.

同被引文献473

1叶浩,王明文,曾雪强.基于潜在语义的多类文本分类模型研究[J].清华大学学报（自然科学版）,2005,45(S1):1818-1822. 被引量：18
2刘云峰 ,齐欢 ,HU Xiang'en ,CAI Zhiqiang ,代建民 .基于潜在语义空间维度特性的多层文档聚类[J].清华大学学报（自然科学版）,2005(S1):1783-1786. 被引量：11
3曾雪强,王明文,陈素芬.一种基于潜在语义结构的文本分类模型[J].华南理工大学学报（自然科学版）,2004,32(z1):99-102. 被引量：27
4刘云峰,齐欢,代建民,王小平.中文信息的潜在语义分析[J].华南理工大学学报（自然科学版）,2004,32(z1):107-111. 被引量：5
5梁茂成.利用WordPilot在外语教学中自建小型语料库[J].外语电化教学,2003(6):42-45. 被引量：57
6辜向东,李志芳,张书奎.大学英语四级考试快速阅读部分内容效度研究[J].西南民族大学学报（人文社会科学版）,2009(1):258-263. 被引量：24
7郑亚非.非英语专业研究生英语交流能力培养研究[J].浙江工业大学学报（社会科学版）,2005,4(1):70-74. 被引量：4
8郑亚非.潜在语义分析与篇章理解[J].浙江工业大学学报（社会科学版）,2006,5(1):70-75. 被引量：1
9肖旭月.语音表征在取词拼写过程中的作用——中国学生英语拼写错误的心理语言学分析[J].外语教学与研究,2001,33(6):422-429. 被引量：90
10陈慧媛.阅读模式与外语阅读教学[J].学术探索,1999(2):72-74. 被引量：2

引证文献37

1罗书全.论生成语法与系统功能语法的互补[J].广东教育学院学报,2006,26(6):94-98. 被引量：5
2唐世民,岳建辉.隐喻的二语习得:潜伏语义分析的预测与实验证据(英文)[J].中国英语教学：英文版,2009(1):109-116.
3郑亚非.潜在语义分析与篇章理解[J].浙江工业大学学报（社会科学版）,2006,5(1):70-75. 被引量：1
4桂诗春.我国外语教学的新思考[J].外国语,2004,27(4):2-9. 被引量：147
5王龙吟,何安平.首届中国语料库语言学与英语教育教学研讨会综述[J].外国语,2005,28(3):77-79. 被引量：9
6黄慧.二语习得的逻辑问题:诠释与思考[J].南昌航空工业学院学报（社会科学版）,2006,8(2):73-76. 被引量：1
7梁茂成.学习者书面语语篇连贯性的研究[J].现代外语,2006,29(3):284-292. 被引量：92
8孙海霞,成颖.潜在语义标引(LSI)研究综述[J].现代图书情报技术,2007(9):49-53. 被引量：6
9梁茂成,文秋芳.国外作文自动评分系统评述及启示[J].外语电化教学,2007(5):18-24. 被引量：183
10王金铨,梁茂成,俞洪亮.基于N-gram和向量空间模型的语句相似度研究[J].现代外语,2007,30(4):405-413. 被引量：14

二级引证文献611

1陆晓蕾,管新潮.翻译质量评估的现状与对策:基于人文社科与自然科学文献的计量研究(1981—2021)[J].中国ESP研究,2023(1):114-125.
2刘丽宁.语域研究中的多维度分析法[J].语言与翻译,2021(1):60-66. 被引量：3
3解月,任伟.不同英语水平学生的段落连贯元语用能力研究[J].语言学研究,2021(1):18-30. 被引量：1
4徐璐燕.说明文写作中衔接与写作质量的相关研究[J].校园英语,2020(52):25-26.
5吴艳华.英语写作智能评阅系统对比研究——以iWrite和批改网为例[J].校园英语,2020(40):69-70. 被引量：2
6吴冠英,王婷婷.基于《泛读教程》与TEM4/TEM8文本的易读性衔接研究[J].校园英语,2020(40):6-8.
7耿娟娟.在线英语写作评价反馈的有效性研究[J].校园英语,2020(34):24-25. 被引量：1
8庞雅心,王晓燕.大语言模型背景下ChatGPT翻译质量研究——以陕西本土文学作品《人生》(节选)为例[J].现代英语,2023(22):67-70.
9王亚芝.集中大量原则在成人英语教与学中的应用[J].现代英语,2020(23):105-108.
10曾灿涛,戴水姣.“二语习得”理论对我国基础外语教育的负面影响与矫正策略[J].现代英语,2020(1):118-120.

1唐世民,岳建辉.隐喻的二语习得:潜伏语义分析的预测与实验证据(英文)[J].中国英语教学：英文版,2009(1):109-116.
2牛娜,尹得霞.潜在语义分析在日语阅读理解教学中的应用[J].教育现代化（电子版）,2016(30):228-230. 被引量：1
3张瑶.对中国学习者英语语料库st3子库中用词失误的分析[J].大学英语教学与研究,2011,50(4):23-25.
4权晓辉,吴江.语料库及数据驱动学习辅助高中英语词汇教学的优势[J].甘肃科技,2013,29(7):72-75. 被引量：2
5曲彬.英语阅读促进非英语专业研究生写作的实证研究[J].当代教育理论与实践,2015,7(11):121-123. 被引量：1
6易明华.隐形意蕴翻译[J].广州航海高等专科学校学报,2006,14(1):63-66.
7闫志章.由日语表达透析日本民族特征[J].山西大同大学学报（社会科学版）,2008,22(5):75-78. 被引量：2
8张茂盛.语言领域中的“柏拉图问题”[J].外国语言文学,1992,11(Z1):8-11. 被引量：1
9张晓鹏.柏拉图问题(Plato’ Problem)还是奥威尔问题(Orwell’ Problem)?——对第二语言习得的重新审视[J].和田师范专科学校学报,2007,27(5):134-135. 被引量：1
10何武.从英语小说阅读到英语写作——一项潜伏语义分析理论驱动的英语写作教学实验报告[J].外语学刊,2013(5):124-129. 被引量：14

现代外语

2003年第1期

浏览历史

内容加载中请稍等...

潜伏语义分析的理论及其应用被引量：37

参考文献25

同被引文献473

引证文献37

二级引证文献611

相关作者

相关机构

相关主题

浏览历史

潜伏语义分析的理论及其应用 被引量：37

参考文献25

同被引文献473

引证文献37

二级引证文献611

相关作者

相关机构

相关主题

浏览历史

潜伏语义分析的理论及其应用被引量：37