期刊文献+
共找到4篇文章
< 1 >
每页显示 20 50 100
Leveraging Vision-Language Pre-Trained Model and Contrastive Learning for Enhanced Multimodal Sentiment Analysis
1
作者 Jieyu An Wan Mohd Nazmee Wan Zainon Binfen Ding 《Intelligent Automation & Soft Computing》 SCIE 2023年第8期1673-1689,共17页
Multimodal sentiment analysis is an essential area of research in artificial intelligence that combines multiple modes,such as text and image,to accurately assess sentiment.However,conventional approaches that rely on... Multimodal sentiment analysis is an essential area of research in artificial intelligence that combines multiple modes,such as text and image,to accurately assess sentiment.However,conventional approaches that rely on unimodal pre-trained models for feature extraction from each modality often overlook the intrinsic connections of semantic information between modalities.This limitation is attributed to their training on unimodal data,and necessitates the use of complex fusion mechanisms for sentiment analysis.In this study,we present a novel approach that combines a vision-language pre-trained model with a proposed multimodal contrastive learning method.Our approach harnesses the power of transfer learning by utilizing a vision-language pre-trained model to extract both visual and textual representations in a unified framework.We employ a Transformer architecture to integrate these representations,thereby enabling the capture of rich semantic infor-mation in image-text pairs.To further enhance the representation learning of these pairs,we introduce our proposed multimodal contrastive learning method,which leads to improved performance in sentiment analysis tasks.Our approach is evaluated through extensive experiments on two publicly accessible datasets,where we demonstrate its effectiveness.We achieve a significant improvement in sentiment analysis accuracy,indicating the supe-riority of our approach over existing techniques.These results highlight the potential of multimodal sentiment analysis and underscore the importance of considering the intrinsic semantic connections between modalities for accurate sentiment assessment. 展开更多
关键词 Multimodal sentiment analysis vision–language pre-trained model contrastive learning sentiment classification
下载PDF
A versatile framework for analyzing galaxy image data by incorporating Human-in-the-loop in a large vision model
2
作者 Ming-Xiang Fu Yu Song +14 位作者 Jia-Meng Lv Liang Cao Peng Jia Nan Li Xiang-Ru Li Ji-Feng Liu A-Li Luo Bo Qiu Shi-Yin Shen Liang-Ping Tu Li-Li Wang Shou-Lin Wei Hai-Feng Yang Zhen-Ping Yi Zhi-Qiang Zou 《Chinese Physics C》 SCIE CAS CSCD 2024年第9期176-187,共12页
The exponential growth of astronomical datasets provides an unprecedented opportunity for humans to gain insight into the Universe.However,effectively analyzing this vast amount of data poses a significant challenge.I... The exponential growth of astronomical datasets provides an unprecedented opportunity for humans to gain insight into the Universe.However,effectively analyzing this vast amount of data poses a significant challenge.In response,astronomers are turning to deep learning techniques,but these methods are limited by their specific training sets,leading to considerable duplicate workloads.To overcome this issue,we built a framework for the general analysis of galaxy images based on a large vision model(LVM)plus downstream tasks(DST),including galaxy morphological classification,image restoration object detection,parameter extraction,and more.Considering the low signal-to-noise ratios of galaxy images and the imbalanced distribution of galaxy categories,we designed our LVM to incorporate a Human-in-the-loop(HITL)module,which leverages human knowledge to enhance the reliability and interpretability of processing galaxy images interactively.The proposed framework exhibits notable fewshot learning capabilities and versatile adaptability for all the abovementioned tasks on galaxy images in the DESI Legacy Imaging Surveys.In particular,for the object detection task,which was trained using 1000 data points,our DST in the LVM achieved an accuracy of 96.7%,while ResNet50 plus Mask R-CNN reached an accuracy of 93.1%.For morphological classification,to obtain an area under the curve(AUC)of~0.9,LVM plus DST and HITL only requested 1/50 of the training sets that ResNet18 requested.In addition,multimodal data can be integrated,which creates possibilities for conducting joint analyses with datasets spanning diverse domains in the era of multi-messenger astronomy. 展开更多
关键词 artificial intelligence large vision model human-in-the-loop ASTRONOMY galaxies
原文传递
Binocular Vision Object Positioning Method for Robots Based on Coarse-fine Stereo Matching 被引量:5
3
作者 Wei-Ping Ma Wen-Xin Li Peng-Xia Cao 《International Journal of Automation and computing》 EI CSCD 2020年第4期562-571,共10页
In order to improve the low positioning accuracy and execution efficiency of the robot binocular vision,a binocular vision positioning method based on coarse-fine stereo matching is proposed to achieve object position... In order to improve the low positioning accuracy and execution efficiency of the robot binocular vision,a binocular vision positioning method based on coarse-fine stereo matching is proposed to achieve object positioning.The random fern is used in the coarse matching to identify objects in the left and right images,and the pixel coordinates of the object center points in the two images are calculated to complete the center matching.In the fine matching,the right center point is viewed as an estimated value to set the search range of the right image,in which the region matching is implemented to find the best matched point of the left center point.Then,the similar triangle principle of the binocular vision model is used to calculate the 3D coordinates of the center point,achieving fast and accurate object positioning.Finally,the proposed method is applied to the object scene images and the robotic arm grasping platform.The experimental results show that the average absolute positioning error and average relative positioning error of the proposed method are 8.22 mm and 1.96%respectively when the object's depth distance is within 600 mm,the time consumption is less than 1.029s.The method can meet the needs of the robot grasping system,and has better accuracy and robustness. 展开更多
关键词 Object positioning stereo matching random fern normalized cross correlation binocular vision model
原文传递
Topography of Visual Features in the Human Ventral Visual Pathway 被引量:1
4
作者 Shijia Fan Xiaosha Wang +2 位作者 Xiaoying Wang Tao Wei Yanchao Bi 《Neuroscience Bulletin》 SCIE CAS CSCD 2021年第10期1454-1468,共15页
Visual object recognition in humans and nonhuman primates is achieved by the ventral visual pathway(ventral occipital-temporal cortex,VOTC),which shows a well-documented object domain structure.An on-going question is... Visual object recognition in humans and nonhuman primates is achieved by the ventral visual pathway(ventral occipital-temporal cortex,VOTC),which shows a well-documented object domain structure.An on-going question is what type of information is processed in the higher-order VOTC that underlies such observations,with recent evidence suggesting effects of certain visual features.Combining computational vision models,fMRI experiment using a parametric-modulation approach,and natural image statistics of common objects,we depicted the neural distribution of a comprehensive set of visual features in the VOTC,identifying voxel sensitivities with specific feature sets across geometry/shape,Fourier power,and color.The visual feature combination pattern in the VOTC is significantly explained by their relationships to different types of response-action computation(fight-or-flight,navigation,and manipulation),as derived from behavioral ratings and natural image statistics.These results offer a comprehensive visual feature map in the VOTC and a plausible theoretical explanation as a mapping onto different types of downstream response-action systems. 展开更多
关键词 Ventral occipital temporal cortex Computational vision model Domain organization Response mapping
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部