A Computational Model of Concept Generalization in Cross-Modal Reference 被引量：1

A Computational Model of Concept Generalization in Cross-Modal Reference

导出

摘要 Cross-modal interactions between visual understanding and linguistic processing substantially contribute to the remarkable robustness of human language processing.We argue that the formation of cross-modal referential links is a prerequisite for the occurrence of cross-modal interactions between vision and language.In this paper we examine a computational model for a cross-modal reference formation with respect to its robustness against conceptual underspecification in the visual modality.This investigation is motivated by the fact that natural systems are well capable of establishing a cross-modal reference between modalities with different degrees of conceptual specification.In the investigated model,conceptually underspecified context information continues to drive the syntactic disambiguation of verb-centered syntactic ambiguities as long as the visual context contains the situation arity information of the visual scene. Cross-modal interactions between visual understanding and linguistic processing substantially contribute to the remarkable robustness of human language processing.We argue that the formation of cross-modal referential links is a prerequisite for the occurrence of cross-modal interactions between vision and language.In this paper we examine a computational model for a cross-modal reference formation with respect to its robustness against conceptual underspecification in the visual modality.This investigation is motivated by the fact that natural systems are well capable of establishing a cross-modal reference between modalities with different degrees of conceptual specification.In the investigated model,conceptually underspecified context information continues to drive the syntactic disambiguation of verb-centered syntactic ambiguities as long as the visual context contains the situation arity information of the visual scene.

作者 Patrick McCrae Wolfgang Menzel Maosong SUN

机构地区 CINACS Graduate Research Group Natural Language Systems Group Department of Computer Science

出处《Tsinghua Science and Technology》 SCIE EI CAS 2011年第2期113-120,共8页 清华大学学报（自然科学版（英文版）

基金 Supported by the German Research Foundation (No. GRK 1247/1)

关键词 vision-language interaction cross-modal reference syntactic disambiguation vision-language interaction cross-modal reference syntactic disambiguation

分类号 TP391.1 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献19

1Cooper R M. The control of eye fixation by the meaning of spoken language: A new methodology for the real-time investigation of speech perception, memory, and language processing. Cognitive Psychology, 1974, 6: 84-107.
2Tanenhaus M K, Spivey-Knowlton M J, Eberhard K M, et al. Integration of visual and linguistic information in spo- ken language comprehension. Science, 1995, 268: 1632-1634.
3Knoferle P S. The role of visual scenes in spoken language comprehension: Evidence from eye-tracking [Dissertation]. Universitait des Saarlandes SaarbrOcken, Germany, 2005.
4Barwise J, Perry J. Situations and Attitudes. Cambridge, MA: MIT Press, 1983.
5Jackendoff R. Semantics and Cognition. Cambridge, MA: MIT Press, 1983.
6Jackendoff R. The architecture of the linguistic-spatialinterface. In: Bloom P, Peterson M A, Nadel L, et al., eds. Language and Space, Chapter 1. Cambridge, MA: MIT Press, 1996: 1-30.
7Brown M K, Buntschuh B M, Wilpon J G. Sam: A percep- tive spoken language understanding robot. IEEE Transac- tions on Systems, Man, and Cybernetics, 1992, 22: 1390-1402.
8Srihari R K, Burhans D T. Visual semantics: Extracting visual information from text accompanying pictures. In: Proceedings of the 12th National Conference on Artificial Intelligence (AAAI-94). Seattle, USA, 1994: 793-798.
9Socher G Qualitative scene descriptions from images for integrated speech and image understanding [Dissertation]. Technical Faculty, University of Bielefeld, Germany, 1997.
10Socher G, Sagerer C1 Kummert F, et al. Talking about 3D scenes: Integration of image and speech understanding in a hybrid distributed system. In: Proceedings of the International Conference on Image Processing (ICIP-96). Lausanne, Switzerland, 1996: 18A2.