How Good is Google Bard's Visual Understanding? An Empirical Study on Open Challenges

导出

摘要 Google's Bard has emerged as a formidable competitor to OpenAI's ChatGPT in the field of conversational AI.Notably,Bard has recently been updated to handle visual inputs alongside text prompts during conversations.Given Bard's impressive track record in handling textual inputs,we explore its capabilities in understanding and interpreting visual data(images)conditioned by text questions.This exploration holds the potential to unveil new insights and challenges for Bard and other forthcoming multi-modal Generative models,especially in addressing complex computer vision problems that demand accurate visual and language understanding.Specifically,in this study,we focus on 15 diverse task scenarios encompassing regular,camouflaged,medical,under-water and remote sensing data to comprehensively evaluate Bard's performance.Our primary finding indicates that Bard still struggles in these vision scenarios,highlighting the significant gap in vision-based understanding that needs to be bridged in future developments.We expect that this empirical study will prove valuable in advancing future models,leading to enhanced capabilities in comprehending and interpreting finegrained visual data.Our project is released on https://github.com/htqin/GoogleBard-VisUnderstand.

作者 Haotong Qin Ge-Peng Ji Salman Khan Deng-Ping Fan Fahad Shahbaz Khan Luc Van Gool

机构地区 Computer Vision Lab(CVL) College of Engineering Mohamed bin Zayed University of Artificial Intelligence

出处《Machine Intelligence Research》 EI CSCD 2023年第5期605-613,共9页 机器智能研究（英文版）

关键词 Google Bard multi-modal understanding visual comprehension large language models conversational AI chatbot.

分类号 TP391.41 [自动化与计算机技术—计算机应用技术] TP18 [自动化与计算机技术—控制理论与控制工程]

引文网络
相关文献

1苏嘉红,杨伟鹏.Unlocking the Power of ChatGPT:A Framework for Applying Generative AI in Education[J].ECNU Review of Education,2023,6(3):355-366. 被引量：3
2阿科斯·菲米特,米莉·科斯蒂根,丹妮尔·迈尔斯,尼尔·尼克松,王正(翻译).环球[J].现代物业,2023(9):8-9.
3钱鸿生.ChatGPT技术架构及我国人工智能发展策略的研究(下)[J].邮电经济,2023(3):31-40.
4王志成.发力物联网大模型是运营商的必然选择[J].通信企业管理,2023(9):10-13.
5傅勇.ChatGPT时代对计算机专业人才培养的影响分析[J].无线互联科技,2023,20(15):135-138. 被引量：2
6艾丽格玛.OpenAI将破产?ChatGPT何去何从[J].中国战略新兴产业,2023(10):90-93.
7Eliza Strickland,Glenn Zorpette.人工智能启示录[J].科技纵览,2023(8):38-39.
8朱光辉,王喜文.ChatGPT的运行模式、关键技术及未来图景[J].新疆师范大学学报（哲学社会科学版）,2023,44(4):113-122. 被引量：257
9付宇鹏,邓向阳,何明,朱子强,张立民.基于强化学习的固定翼飞机姿态控制方法[J].控制与决策,2023,38(9):2505-2510. 被引量：1
10赵晖,鲍妍.短视频应用场景下的文旅资源深度融合[J].当代电视,2023(10):27-32. 被引量：4

Machine Intelligence Research

2023年第5期

浏览历史

内容加载中请稍等...

How Good is Google Bard's Visual Understanding? An Empirical Study on Open Challenges

相关作者

相关机构

相关主题

浏览历史