视觉知识:跨媒体智能进化的新支点被引量：4

The review of visual knowledge:a new pivot for cross-media intelligence evolution

导出

摘要回顾跨媒体智能的发展历程,分析跨媒体智能的新趋势与现实瓶颈,展望跨媒体智能的未来前景。跨媒体智能旨在融合多来源、多模态数据,并试图利用不同媒体数据间的关系进行高层次语义理解与逻辑推理。现有跨媒体算法主要遵循了单媒体表达到多媒体融合的范式,其中特征学习与逻辑推理两个过程相对割裂,无法综合多源多层次的语义信息以获得统一特征,阻碍了推理和学习过程的相互促进和修正。这类范式缺乏显式知识积累与多级结构理解的过程,同时限制了模型可信度与鲁棒性。在这样的背景下,本文转向一种新的智能表达方式——视觉知识。以视觉知识驱动的跨媒体智能具有多层次建模和知识推理的特点,并易于进行视觉操作与重建。本文介绍了视觉知识的3个基本要素,即视觉概念、视觉关系和视觉推理,并对每个要素展开详细讨论与分析。视觉知识有助于实现数据与知识驱动的统一框架,学习可归因可溯源的结构化表达,推动跨媒体知识关联与智能推理。视觉知识具有强大的知识抽象表达能力和多重知识互补能力,为跨媒体智能进化提供了新的有力支点。 We review the recent development of cross-media intelligence,analyze its new trends and challenges,and discuss future prospects of cross-media intelligence.Cross-media intelligence is focused on the integration of multi-source and multi-modal data.It attempts to use the relationship between different media data for high-level semantic understanding and logical reasoning.Existing cross-media algorithms mainly follow the paradigm of“single media representation”to“multimedia integration”,in which the two processes of feature learning and logical reasoning are relatively disconnected.It is unlikely to synthesize multi-source and multi-level semantic information to obtain unified features,which hinders the mutual benefits of the reasoning and learning process.This paradigm is lack of the process of explicit knowledge accumulation and multi-level structure understanding.At the same time,it restricts the interpretability and robustness of the model.We interpret new representation method,i.e.,visual knowledge.Visual knowledge driven cross-media intelligence has the features of multi-level modeling and knowledge reasoning.Its built-in mechanisms can implement operations and reconstruction visually,which learns knowledge alignment and association.To establish a unified way of knowledge representation learning,the theory of visual knowledge has been illustrated as mentioned below:1)we introduce three key factors of visual contexts,i.e.,concept,visual relationship,and visual reasoning.Visual knowledge has capable of knowledge representations abstraction and multiple knowledge complementing.Visual relations represent the relationship between visual concepts and provide an effective basis for more complex cross-media visual reasoning.We demonstrate visual-based spatio-temporal and causal relationships,but the visual relationship is not limited to these categories.We recommend that the pairwise visual relationships should be extended to multi-objects cascade relationships and the integrated spatio-temporal and causal representations effectively.Visual knowledge is derived of visual concepts and visual relationships,enabling more interpretive and generalized high-level cross-media visual reasoning.Visual knowledge develops a structured knowledge representation,a multi-level basis for visual reasoning,and realizes an effective demonstration for neural network decisions.Broadly,the referred visual reasoning includes a variety of visual operations,such as prediction,reconstruction,association and decomposition.2)We discuss the applications of visual knowledge,and introduce detailed analysis on their future challenges.We select three applications of those are structured representation of visual knowledge,operation and reasoning of visual knowledge,and cross-media reconstruction and generation.Visual knowledge is predicted to resolve the ambiguity problems in relational descriptions and suppress data bias effectively.It is worth noting that these three specific applications are involved some cross-media intelligence examples of visual knowledge only.Although hand-crafted features are less capable of abstracting multimedia data than deep learning features,these descriptors tend to be more interpretable.The effective integration of hand-crafted features and deep learning features for cross-media representation modeling is a typical application of visual knowledge representation in the context of cross-media intelligence.The structured representation of visual knowledge contributes to the improvement of model interpretability.3)We analyze the advantages of visual knowledge.It aids to achieve a unified framework driven by both data and knowledge,learn explainable structured representations,and promote cross-media knowledge association and intelligent reasoning.Thanks to the development of visual knowledge based cross-media intelligence,more emerging cross-media intelligence applications will be developed.The decision-making assistance process is more credible through the structural and multi-granularity representation of visual knowledge and the integrated optimization of multi-source and cross-domain data.The reasoning process can be reviewed and clarified,and the model generalization ability can be improved systematically.These factors provide a new powerful pivot for the evolution of cross-media intelligence.Visual knowledge can improve the generative models greatly and enhance the application of simulation technology.Future visual knowledge can be used as a prior to improve the rendering of scenes,realize interactive visual editing tools and controllable semantic understanding of scene objects.A data-driven and visual knowledge derived graphics system will be focused on the integration of the strengths of data and rules,semantic features extraction of visual data,model complexity optimization,simulation improvement,and realistic and sustainable content in new perspectives and new scenarios.

作者杨易庄越挺潘云鹤 Yang Yi;Zhuang Yueting;Pan Yunhe(College of Computer Science and Technology,Zhejiang University,Hangzhou 310027,China;Zhejiang Laboratory,Hangzhou 310027,China)

机构地区浙江大学计算机科学与技术学院之江实验室

出处《中国图象图形学报》 CSCD 北大核心 2022年第9期2574-2588,共15页 Journal of Image and Graphics

基金国家重点研发计划资助(2020AAA0108800) 中央高校基本科研业务费专项资金资助(226-2022-00051)。

关键词跨媒体智能视觉知识视觉概念视觉关系视觉推理 cross-media intelligence visual knowledge visual concepts visual relationships visual reasoning

分类号 TP391.7 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献5

1Yi YANG,Yueting ZHUANG,Yunhe PAN.Multiple knowledge representation for big data artificial intelligence:framework,applications,and case studies[J].Frontiers of Information Technology & Electronic Engineering,2021,22(12):1551-1558. 被引量：6
2Yunhe Pan.Multiple Knowledge Representation of Artificial Intelligence[J].Engineering,2020,6(3):216-217. 被引量：10
3潘云鹤.综合推理的研究[J].模式识别与人工智能,1996,9(3):201-208. 被引量：29
4吕露露,黄毅,高君宇,杨小汕,徐常胜.多模态零样本人体动作识别[J].中国图象图形学报,2021,26(7):1658-1667. 被引量：3
5成科扬,吴金霞,王文杉,荣兰,詹永照.融合时空图卷积的多人交互行为识别[J].中国图象图形学报,2021,26(7):1681-1691. 被引量：5

二级参考文献9

1李未，计算机科学，1993年，20卷，1页
2潘云鹤，浙江大学学报，1993年，27卷，3期，363页
3潘云鹤，模式识别与人工智能，1991年，2期，7页
4何新贵，知识处理与专家系统，1990年
5钱学森，关于思维科学，1986年
6王世刚,孙爱朦,赵文婷,惠祥龙.基于时空兴趣点的单人行为及交互行为识别[J].吉林大学学报（工学版）,2015,45(1):304-308. 被引量：9
7Yunhe Pan.Heading toward Artificial Intelligence 2.0[J].Engineering,2016,2(4):409-413. 被引量：118
8魏宏喜,张越.基于生成对抗网络的零样本图像分类[J].北京航空航天大学学报,2019,45(12):2345-2350. 被引量：7
9刘建伟,丁熙浩,罗雄麟.多模态深度学习综述[J].计算机应用研究,2020,37(6):1601-1614. 被引量：34

共引文献46

1王鸿洁,常国岑.可拓神经网络在可拓专家系统中的研究与应用[J].系统工程与电子技术,2005,27(2):264-266. 被引量：10
2张莉立.一种产品形态设计的过程模型[J].长沙铁道学院学报,2000,18(1):58-62. 被引量：2
3翁建广,庄越挺,潘云鹤.基于改进稀疏场算法的水平集形状过渡[J].软件学报,2006,17(7):1544-1552. 被引量：2
4代建华,陈卫东,潘云鹤.基于粗糙集的综合推理模型[J].浙江大学学报（工学版）,2006,40(9):1526-1530. 被引量：4
5代建华,陈卫东,潘云鹤.单源综合推理的研究[J].浙江大学学报（工学版）,2006,40(11):1966-1971. 被引量：1
6叶球孙.网上Web QoS控制技术的需求分析[J].现代计算机,2007,13(11):10-11.
7杨易,郭同强,庄越挺,王文华.基于综合推理的多媒体语义挖掘和跨媒体检索[J].计算机辅助设计与图形学学报,2009,21(9):1307-1314. 被引量：12
8耿卫东,潘云鹤.语义与视觉形象的集成模型综述[J].模式识别与人工智能,1999,12(1):56-66. 被引量：2
9何斌,张应利.可拓学在人工智能中的应用初探[J].华南理工大学学报（自然科学版）,1999,27(6):88-92. 被引量：9
10李宁,潘云鹤.计算机建筑画的现状与发展[J].计算机辅助设计与图形学学报,1999,11(4):379-383. 被引量：6

同被引文献62

1丁文文,王帅,李娟娟,袁勇,欧阳丽炜,王飞跃.去中心化自治组织:发展现状、分析框架与未来趋势[J].智能科学与技术学报,2019,0(2):202-213. 被引量：33
2张振宇,王然.中国视觉传播的研究源起、学术进路与知识谱系(1986-2022)[J].国际新闻界,2023,45(1):84-105. 被引量：1
3陈卫星.西方当代传播学学术思想的回顾和展望(上)[J].国外社会科学,1998(1):3-7. 被引量：29
4陈力丹.试论传播学方法论的三个学派[J].新闻与传播研究,2005,12(2):40-47. 被引量：56
5张咏华.英国的新闻与传播学教育一瞥[J].新闻记者,2006(10):78-81. 被引量：7
6何道宽.媒介环境学辨析[J].国际新闻界,2007,29(1):46-49. 被引量：48
7曹茹.新媒介环境中议程设置的变化及其实质[J].河北大学学报（哲学社会科学版）,2008,33(4):119-122. 被引量：48
8吴飞,庄越挺.互联网跨媒体分析与检索:理论与算法[J].计算机辅助设计与图形学学报,2010,22(1):1-9. 被引量：34
9高宪春.新媒介环境下议程设置理论研究新进路的分析[J].新闻与传播研究,2011,18(1):12-20. 被引量：103
10周星.本质主义的汉服言说和建构主义的文化实践——汉服运动的诉求、收获及瓶颈[J].民俗研究,2014(3):130-144. 被引量：40

引证文献4

1王育济,李萌.数字赋能中华优秀传统文化“两创”的产消机制研究[J].山东大学学报（哲学社会科学版）,2023(3):41-50. 被引量：9
2徐开玉.新媒介环境下电影的“跨媒介算法”思维[J].电影文学,2023(9):36-42. 被引量：2
3于德山.计算视觉传播研究:理论体系、范式转型与学术想象力[J].传媒观察,2024(1):39-46.
4Yawei LUO,Yi YANG.Large language model and domain-specific model collaboration for smart education[J].Frontiers of Information Technology & Electronic Engineering,2024,25(3):333-341.

二级引证文献11

1王育济,何昭旭.“技术-文化”与数字时代中华文化的复兴[J].烟台大学学报（哲学社会科学版）,2023,36(4):37-48. 被引量：4
2张文明,王运红.中国式现代化背景下优秀传统文化主题出版发展探析[J].科技与出版,2023(7):65-71. 被引量：1
3王婉波.论中国网络文学中华优秀传统文化的“两创”面向及实践路径[J].文学评论,2023(6):82-90.
4雷文宣,解学芳.从封存式到活态化:中华优秀传统文化的数智化创新[J].出版广角,2023(19):11-18.
5胡海波,姜浩天.“第二个结合”下的管理学新文科建设模式--以江西财经大学为例[J].新文科理论与实践,2023(4):27-40.
6邵安华.新文科视域下图书馆数字人文建设研究[J].图书馆,2023(12):42-50.
7陈波,王楚乔.场景理论下虚拟文化空间消费吸附力研究——基于15个手机APP的实证分析[J].福建论坛（人文社会科学版）,2023(12):38-53.
8王利丽,宋珮暄.剧场化、游戏化与动画性:国产悬疑网剧的跨媒介叙事[J].电影文学,2024(3):22-27.
9张铮,仲宇璐.科技赋能传统文化“两创”——以非遗传承为例[J].北京文化创意,2024(1):33-40.
10何天平,宋航.在“算法”和“经验”之间——基于流媒体视听内容生产实践的考察[J].西南民族大学学报（人文社会科学版）,2024,45(2):133-141.

1钟冠华,黄巍.基于多特征提取网络的视觉关系检测方法研究[J].电脑与电信,2022(7):67-70. 被引量：2
2王鸣展,冀俊忠,贾奥哲,张晓丹.基于跨尺度特征融合自注意力的图像描述方法[J].计算机科学,2022,49(10):191-197. 被引量：2
3陈秀娟.极氪001智能进化[J].汽车观察,2022(7):86-88.
4吴宏春,郝卓,康可可.防错技术在数控加工中的应用[J].中国科技信息,2022(19):92-94.
5胡飞.精心打磨融媒作品做强做大主题传播[J].新闻战线,2022(12):85-87.
6杨丽珍.“新工科”背景下大学生思想政治教育的价值蕴涵、现实瓶颈与发展路径[J].中国轻工教育,2022,25(3):1-8. 被引量：2
7湖北省武汉市蔡甸区推行四项清单以乡村善治助力乡村振兴[J].农村财务会计,2022(3):57-60.
8张良,田晓倩,李少毅,杨曦.基于时空推理网络的空中红外目标抗干扰识别算法[J].红外与激光工程,2022,51(7):463-472. 被引量：1
9杨涛,余波.航天测控健康管理技术研究[J].现代信息科技,2022,6(13):151-154.
10韦平.黄羽肉鸡行业新观察[J].中国禽业导刊,2022,39(10):21-22. 被引量：1

中国图象图形学报

2022年第9期

浏览历史

内容加载中请稍等...

视觉知识:跨媒体智能进化的新支点被引量：4

参考文献5

二级参考文献9

共引文献46

同被引文献62

引证文献4

二级引证文献11

相关作者

相关机构

相关主题

浏览历史

视觉知识:跨媒体智能进化的新支点 被引量：4

参考文献5

二级参考文献9

共引文献46

同被引文献62

引证文献4

二级引证文献11

相关作者

相关机构

相关主题

浏览历史

视觉知识:跨媒体智能进化的新支点被引量：4