Background Determining how an image is visually appealing is a complicated and subjective task. This motivates the use of a machine-learning model to evaluate image aesthetics automatically by matching the aesthetics ...Background Determining how an image is visually appealing is a complicated and subjective task. This motivates the use of a machine-learning model to evaluate image aesthetics automatically by matching the aesthetics of the general public. Although deep learning methods have successfully learned good visual features from images,correctly assessing the aesthetic quality of an image remains a challenge for deep learning. Methods To address this, we propose a novel multiview convolutional neural network to assess image aesthetics assessment through color composition and space formation(IAACS). Specifically, from different views of an image––including its key color components and their contributions, the image space formation, and the image itself––our network extracts the corresponding features through our proposed feature extraction module(FET) and the Image Net weight-based classification model. Result By fusing the extracted features, our network produces an accurate prediction score distribution for image aesthetics. The experimental results show that we have achieved superior performance.展开更多
With the explosive growth of false information on social media platforms, the automatic detection of multimodalfalse information has received increasing attention. Recent research has significantly contributed to mult...With the explosive growth of false information on social media platforms, the automatic detection of multimodalfalse information has received increasing attention. Recent research has significantly contributed to multimodalinformation exchange and fusion, with many methods attempting to integrate unimodal features to generatemultimodal news representations. However, they still need to fully explore the hierarchical and complex semanticcorrelations between different modal contents, severely limiting their performance detecting multimodal falseinformation. This work proposes a two-stage detection framework for multimodal false information detection,called ASMFD, which is based on image aesthetic similarity to segment and explores the consistency andinconsistency features of images and texts. Specifically, we first use the Contrastive Language-Image Pre-training(CLIP) model to learn the relationship between text and images through label awareness and train an imageaesthetic attribute scorer using an aesthetic attribute dataset. Then, we calculate the aesthetic similarity betweenthe image and related images and use this similarity as a threshold to divide the multimodal correlation matrixinto consistency and inconsistencymatrices. Finally, the fusionmodule is designed to identify essential features fordetectingmultimodal false information. In extensive experiments on four datasets, the performance of the ASMFDis superior to state-of-the-art baseline methods.展开更多
Distinguishing aesthetically pleasing food photos from others is an important visual analysis task for social media and ranking systems related to food.Nevertheless,aesthetic assessment of food images remains a challe...Distinguishing aesthetically pleasing food photos from others is an important visual analysis task for social media and ranking systems related to food.Nevertheless,aesthetic assessment of food images remains a challenging and relatively unexplored task,largely due to the lack of related food image datasets and practical knowledge.Thus,we present the Gourmet Photography Dataset(GPD),the first largescale dataset for aesthetic assessment of food photos.It contains 24,000 images with corresponding binary aesthetic labels,covering a large variety of foods and scenes.We also provide a non-stationary regularization method to combat over-fitting and enhance the ability of tuned models to generalize.Quantitative results from extensive experiments,including a generalization ability test,verify that neural networks trained on the GPD achieve comparable performance to human experts on the task of aesthetic assessment.We reveal several valuable findings to support further research and applications related to visual aesthetic analysis of food images.To encourage further research,we have made the GPD publicly available at https://github.com/Openning07/GPA.展开更多
Recent image aesthetic assessment methods have achieved remarkable progress due to the emergence of deep convolutional neural networks(CNNs).However,these methods focus primarily on predicting generally perceived pref...Recent image aesthetic assessment methods have achieved remarkable progress due to the emergence of deep convolutional neural networks(CNNs).However,these methods focus primarily on predicting generally perceived preference of an image,making them usually have limited practicability,since each user may have completely different preferences for the same image.To address this problem,this paper presents a novel approach for predicting personalized image aesthetics that fit an individual user’s personal taste.We achieve this in a coarse to fine manner,by joint regression and learning from pairwise rankings.Specifically,we first collect a small subset of personal images from a user and invite him/her to rank the preference of some randomly sampled image pairs.We then search for the K-nearest neighbors of the personal images within a large-scale dataset labeled with average human aesthetic scores,and use these images as well as the associated scores to train a generic aesthetic assessment model by CNN-based regression.Next,we fine-tune the generic model to accommodate the personal preference by training over the rankings with a pairwise hinge loss.Experiments demonstrate that our method can effectively learn personalized image aesthetic preferences,clearly outperforming state-of-the-art methods.Moreover,we show that the learned personalized image aesthetic benefits a wide variety of applications.展开更多
基金Supported by the National Key R&D Program of China (No:2018YFB1403202)the National Natural Science Foundation of China(62172366)。
文摘Background Determining how an image is visually appealing is a complicated and subjective task. This motivates the use of a machine-learning model to evaluate image aesthetics automatically by matching the aesthetics of the general public. Although deep learning methods have successfully learned good visual features from images,correctly assessing the aesthetic quality of an image remains a challenge for deep learning. Methods To address this, we propose a novel multiview convolutional neural network to assess image aesthetics assessment through color composition and space formation(IAACS). Specifically, from different views of an image––including its key color components and their contributions, the image space formation, and the image itself––our network extracts the corresponding features through our proposed feature extraction module(FET) and the Image Net weight-based classification model. Result By fusing the extracted features, our network produces an accurate prediction score distribution for image aesthetics. The experimental results show that we have achieved superior performance.
文摘With the explosive growth of false information on social media platforms, the automatic detection of multimodalfalse information has received increasing attention. Recent research has significantly contributed to multimodalinformation exchange and fusion, with many methods attempting to integrate unimodal features to generatemultimodal news representations. However, they still need to fully explore the hierarchical and complex semanticcorrelations between different modal contents, severely limiting their performance detecting multimodal falseinformation. This work proposes a two-stage detection framework for multimodal false information detection,called ASMFD, which is based on image aesthetic similarity to segment and explores the consistency andinconsistency features of images and texts. Specifically, we first use the Contrastive Language-Image Pre-training(CLIP) model to learn the relationship between text and images through label awareness and train an imageaesthetic attribute scorer using an aesthetic attribute dataset. Then, we calculate the aesthetic similarity betweenthe image and related images and use this similarity as a threshold to divide the multimodal correlation matrixinto consistency and inconsistencymatrices. Finally, the fusionmodule is designed to identify essential features fordetectingmultimodal false information. In extensive experiments on four datasets, the performance of the ASMFDis superior to state-of-the-art baseline methods.
基金supported by the National Natural Science Foundation of China under Grant Nos.61832016,61672520CASIA-Tencent Youtu joint research project。
文摘Distinguishing aesthetically pleasing food photos from others is an important visual analysis task for social media and ranking systems related to food.Nevertheless,aesthetic assessment of food images remains a challenging and relatively unexplored task,largely due to the lack of related food image datasets and practical knowledge.Thus,we present the Gourmet Photography Dataset(GPD),the first largescale dataset for aesthetic assessment of food photos.It contains 24,000 images with corresponding binary aesthetic labels,covering a large variety of foods and scenes.We also provide a non-stationary regularization method to combat over-fitting and enhance the ability of tuned models to generalize.Quantitative results from extensive experiments,including a generalization ability test,verify that neural networks trained on the GPD achieve comparable performance to human experts on the task of aesthetic assessment.We reveal several valuable findings to support further research and applications related to visual aesthetic analysis of food images.To encourage further research,we have made the GPD publicly available at https://github.com/Openning07/GPA.
基金supported partially by the National Key Research and Development Program of China(2018YFB1004903)National Natural Science Foundation of China(61802453,U1911401,U1811461)+1 种基金Fundamental Research Funds for the Central Universities(19lgpy216)Research Projects of Zhejiang Lab(2019KD0AB03).
文摘Recent image aesthetic assessment methods have achieved remarkable progress due to the emergence of deep convolutional neural networks(CNNs).However,these methods focus primarily on predicting generally perceived preference of an image,making them usually have limited practicability,since each user may have completely different preferences for the same image.To address this problem,this paper presents a novel approach for predicting personalized image aesthetics that fit an individual user’s personal taste.We achieve this in a coarse to fine manner,by joint regression and learning from pairwise rankings.Specifically,we first collect a small subset of personal images from a user and invite him/her to rank the preference of some randomly sampled image pairs.We then search for the K-nearest neighbors of the personal images within a large-scale dataset labeled with average human aesthetic scores,and use these images as well as the associated scores to train a generic aesthetic assessment model by CNN-based regression.Next,we fine-tune the generic model to accommodate the personal preference by training over the rankings with a pairwise hinge loss.Experiments demonstrate that our method can effectively learn personalized image aesthetic preferences,clearly outperforming state-of-the-art methods.Moreover,we show that the learned personalized image aesthetic benefits a wide variety of applications.