摘要
为了解决室内场景中物体的准确检测与描述问题,本文设计了双目立体视觉的室内场景描述模型。系统首先使用双目摄像头模组捕捉具有视差的左右图像信息,并借助SGBM算法获得包含深度信息匹配点的视差图,然后运用基于图的密集视觉描述模型输出场景的描述内容,最后使用开源语音模块进行播报。在ScanRefer数据集的测试结果表明,模型在描述3D物体方面表现出色,在CIDEr@0.5IoU评价指标上达到了40.69%。
In order to solve the problem of accurate detection and description of objects in indoor scenes,this paper designs a binocular stereo vision indoor scene description model.The system first uses a binocular camera module to capture left and right image information with disparity,and uses the SGBM algorithm to obtain a disparity map containing depth information matching points.Then,a graph based dense visual description model is used to output the scene description content.Finally,an open-source voice module is used for broadcasting.The test results on the ScanReferr dataset show that the model performs well in describing 3D objects,with a CIDEr@0.5IoU The evaluation index reached 40.69%.
作者
黄启航
程昊阳
王然
HUANG Qihang;CHENG Haoyang;WANG Ran(School of Computer Science,Hangzhou Dianzi University,Hangzhou,China,310018)
出处
《福建电脑》
2024年第7期23-28,共6页
Journal of Fujian Computer
基金
浙江省大学生科技创新活动计划(新苗计划)(No.GK230701205028)资助。
关键词
室内场景
双目立体视觉
描述模型
Indoor Scenes
Binocular Stereo Vision
Described Model