Visual Question Answering(VQA)has sparked widespread interest as a crucial task in integrating vision and language.VQA primarily uses attention mechanisms to effectively answer questions to associate relevant visual r...Visual Question Answering(VQA)has sparked widespread interest as a crucial task in integrating vision and language.VQA primarily uses attention mechanisms to effectively answer questions to associate relevant visual regions with input questions.The detection-based features extracted by the object detection network aim to acquire the visual attention distribution on a predetermined detection frame and provide object-level insights to answer questions about foreground objects more effectively.However,it cannot answer the question about the background forms without detection boxes due to the lack of fine-grained details,which is the advantage of grid-based features.In this paper,we propose a Dual-Level Feature Embedding(DLFE)network,which effectively integrates grid-based and detection-based image features in a unified architecture to realize the complementary advantages of both features.Specifically,in DLFE,In DLFE,firstly,a novel Dual-Level Self-Attention(DLSA)modular is proposed to mine the intrinsic properties of the two features,where Positional Relation Attention(PRA)is designed to model the position information.Then,we propose a Feature Fusion Attention(FFA)to address the semantic noise caused by the fusion of two features and construct an alignment graph to enhance and align the grid and detection features.Finally,we use co-attention to learn the interactive features of the image and question and answer questions more accurately.Our method has significantly improved compared to the baseline,increasing accuracy from 66.01%to 70.63%on the test-std dataset of VQA 1.0 and from 66.24%to 70.91%for the test-std dataset of VQA 2.0.展开更多
In social networks,user attention affects the user’s decision-making,resulting in a performance alteration of the recommendation systems.Existing systems make recommendations mainly according to users’preferences wi...In social networks,user attention affects the user’s decision-making,resulting in a performance alteration of the recommendation systems.Existing systems make recommendations mainly according to users’preferences with a particular focus on items.However,the significance of users’attention and the difference in the influence of different users and items are often ignored.Thus,this paper proposes an attention-based multi-layer friend recommendation model to mitigate information overload in social networks.We first constructed the basic user and item matrix via convolutional neural networks(CNN).Then,we obtained user preferences by using the relationships between users and items,which were later inputted into our model to learn the preferences between friends.The error performance of the proposed method was compared with the traditional solutions based on collaborative filtering.A comprehensive performance evaluation was also conducted using large-scale real-world datasets collected from three popular location-based social networks.The experimental results revealed that our proposal outperforms the traditional methods in terms of recommendation performance.展开更多
文摘Visual Question Answering(VQA)has sparked widespread interest as a crucial task in integrating vision and language.VQA primarily uses attention mechanisms to effectively answer questions to associate relevant visual regions with input questions.The detection-based features extracted by the object detection network aim to acquire the visual attention distribution on a predetermined detection frame and provide object-level insights to answer questions about foreground objects more effectively.However,it cannot answer the question about the background forms without detection boxes due to the lack of fine-grained details,which is the advantage of grid-based features.In this paper,we propose a Dual-Level Feature Embedding(DLFE)network,which effectively integrates grid-based and detection-based image features in a unified architecture to realize the complementary advantages of both features.Specifically,in DLFE,In DLFE,firstly,a novel Dual-Level Self-Attention(DLSA)modular is proposed to mine the intrinsic properties of the two features,where Positional Relation Attention(PRA)is designed to model the position information.Then,we propose a Feature Fusion Attention(FFA)to address the semantic noise caused by the fusion of two features and construct an alignment graph to enhance and align the grid and detection features.Finally,we use co-attention to learn the interactive features of the image and question and answer questions more accurately.Our method has significantly improved compared to the baseline,increasing accuracy from 66.01%to 70.63%on the test-std dataset of VQA 1.0 and from 66.24%to 70.91%for the test-std dataset of VQA 2.0.
文摘In social networks,user attention affects the user’s decision-making,resulting in a performance alteration of the recommendation systems.Existing systems make recommendations mainly according to users’preferences with a particular focus on items.However,the significance of users’attention and the difference in the influence of different users and items are often ignored.Thus,this paper proposes an attention-based multi-layer friend recommendation model to mitigate information overload in social networks.We first constructed the basic user and item matrix via convolutional neural networks(CNN).Then,we obtained user preferences by using the relationships between users and items,which were later inputted into our model to learn the preferences between friends.The error performance of the proposed method was compared with the traditional solutions based on collaborative filtering.A comprehensive performance evaluation was also conducted using large-scale real-world datasets collected from three popular location-based social networks.The experimental results revealed that our proposal outperforms the traditional methods in terms of recommendation performance.