This paper proposes an artificial intelligence-based robust information hiding algorithm to address the issue of confidential information being susceptible to noise attacks during transmission.The algorithm we designe...This paper proposes an artificial intelligence-based robust information hiding algorithm to address the issue of confidential information being susceptible to noise attacks during transmission.The algorithm we designed aims to mitigate the impact of various noise attacks on the integrity of secret information during transmission.The method we propose involves encoding secret images into stylized encrypted images and applies adversarial transfer to both the style and content features of the original and embedded data.This process effectively enhances the concealment and imperceptibility of confidential information,thereby improving the security of such information during transmission and reducing security risks.Furthermore,we have designed a specialized attack layer to simulate real-world attacks and common noise scenarios encountered in practical environments.Through adversarial training,the algorithm is strengthened to enhance its resilience against attacks and overall robustness,ensuring better protection against potential threats.Experimental results demonstrate that our proposed algorithm successfully enhances the concealment and unknowability of secret information while maintaining embedding capacity.Additionally,it ensures the quality and fidelity of the stego image.The method we propose not only improves the security and robustness of information hiding technology but also holds practical application value in protecting sensitive data and ensuring the invisibility of confidential information.展开更多
The objective of style transfer is to maintain the content of an image while transferring the style of another image.However,conventional methods face challenges in preserving facial features,especially in Korean port...The objective of style transfer is to maintain the content of an image while transferring the style of another image.However,conventional methods face challenges in preserving facial features,especially in Korean portraits where elements like the“Gat”(a traditional Korean hat)are prevalent.This paper proposes a deep learning network designed to perform style transfer that includes the“Gat”while preserving the identity of the face.Unlike traditional style transfer techniques,the proposed method aims to preserve the texture,attire,and the“Gat”in the style image by employing image sharpening and face landmark,with the GAN.The color,texture,and intensity were extracted differently based on the characteristics of each block and layer of the pre-trained VGG-16,and only the necessary elements during training were preserved using a facial landmark mask.The head area was presented using the eyebrow area to transfer the“Gat”.Furthermore,the identity of the face was retained,and style correlation was considered based on the Gram matrix.To evaluate performance,we introduced a metric using PSNR and SSIM,with an emphasis on median values through new weightings for style transfer in Korean portraits.Additionally,we have conducted a survey that evaluated the content,style,and naturalness of the transferred results,and based on the assessment,we can confidently conclude that our method to maintain the integrity of content surpasses the previous research.Our approach,enriched by landmarks preservation and diverse loss functions,including those related to“Gat”,outperformed previous researches in facial identity preservation.展开更多
In recent years,deep generative models have been successfully applied to perform artistic painting style transfer(APST).The difficulties might lie in the loss of reconstructing spatial details and the inefficiency of ...In recent years,deep generative models have been successfully applied to perform artistic painting style transfer(APST).The difficulties might lie in the loss of reconstructing spatial details and the inefficiency of model convergence caused by the irreversible en-decoder methodology of the existing models.Aiming to this,this paper proposes a Flow-based architecture with both the en-decoder sharing a reversible network configuration.The proposed APST-Flow can efficiently reduce model uncertainty via a compact analysis-synthesis methodology,thereby the generalization performance and the convergence stability are improved.For the generator,a Flow-based network using Wavelet additive coupling(WAC)layers is implemented to extract multi-scale content features.Also,a style checker is used to enhance the global style consistency by minimizing the error between the reconstructed and the input images.To enhance the generated salient details,a loss of adaptive stroke edge is applied in both the global and local model training.The experimental results show that the proposed method improves PSNR by 5%,SSIM by 6.2%,and decreases Style Error by 29.4%over the existing models on the ChipPhi set.The competitive results verify that APST-Flow achieves high-quality generation with less content deviation and enhanced generalization,thereby can be further applied to more APST scenes.展开更多
In recent years,speech synthesis systems have allowed for the produc-tion of very high-quality voices.Therefore,research in this domain is now turning to the problem of integrating emotions into speech.However,the met...In recent years,speech synthesis systems have allowed for the produc-tion of very high-quality voices.Therefore,research in this domain is now turning to the problem of integrating emotions into speech.However,the method of con-structing a speech synthesizer for each emotion has some limitations.First,this method often requires an emotional-speech data set with many sentences.Such data sets are very time-intensive and labor-intensive to complete.Second,training each of these models requires computers with large computational capabilities and a lot of effort and time for model tuning.In addition,each model for each emotion failed to take advantage of data sets of other emotions.In this paper,we propose a new method to synthesize emotional speech in which the latent expressions of emotions are learned from a small data set of professional actors through a Flow-tron model.In addition,we provide a new method to build a speech corpus that is scalable and whose quality is easy to control.Next,to produce a high-quality speech synthesis model,we used this data set to train the Tacotron 2 model.We used it as a pre-trained model to train the Flowtron model.We applied this method to synthesize Vietnamese speech with sadness and happiness.Mean opi-nion score(MOS)assessment results show that MOS is 3.61 for sadness and 3.95 for happiness.In conclusion,the proposed method proves to be more effec-tive for a high degree of automation and fast emotional sentence generation,using a small emotional-speech data set.展开更多
With the advent of deep learning,self-driving schemes based on deep learning are becoming more and more popular.Robust perception-action models should learn from data with different scenarios and real behaviors,while ...With the advent of deep learning,self-driving schemes based on deep learning are becoming more and more popular.Robust perception-action models should learn from data with different scenarios and real behaviors,while current end-to-end model learning is generally limited to training of massive data,innovation of deep network architecture,and learning in-situ model in a simulation environment.Therefore,we introduce a new image style transfer method into data augmentation,and improve the diversity of limited data by changing the texture,contrast ratio and color of the image,and then it is extended to the scenarios that the model has been unobserved before.Inspired by rapid style transfer and artistic style neural algorithms,we propose an arbitrary style generation network architecture,including style transfer network,style learning network,style loss network and multivariate Gaussian distribution function.The style embedding vector is randomly sampled from the multivariate Gaussian distribution and linearly interpolated with the embedded vector predicted by the input image on the style learning network,which provides a set of normalization constants for the style transfer network,and finally realizes the diversity of the image style.In order to verify the effectiveness of the method,image classification and simulation experiments were performed separately.Finally,we built a small-sized smart car experiment platform,and apply the data augmentation technology based on image style transfer drive to the experiment of automatic driving for the first time.The experimental results show that:(1)The proposed scheme can improve the prediction accuracy of the end-to-end model and reduce the model’s error accumulation;(2)the method based on image style transfer provides a new scheme for data augmentation technology,and also provides a solution for the high cost that many deep models rely heavily on a large number of label data.展开更多
Aiming at the current process of artistic creation and animation creation, there are a lot of repeated manual operations in the process of conversion from sketch to the stylized image. This paper presented a solution ...Aiming at the current process of artistic creation and animation creation, there are a lot of repeated manual operations in the process of conversion from sketch to the stylized image. This paper presented a solution based on a deep learning framework to realize image generation and style transfer. The method first used the conditional generation to resist the network, optimizes the loss function of the training mapping relationship, and generated the actual image from the input sketch. Then, by defining and optimizing the perceptual loss function of the style transfer model, the style features are extracted from the image, thereby forming the actual The conversion between images and stylized art images. Experiments show that this method can greatly reduce the work of coloring and converting with different artistic effects, and achieve the purpose of transforming simple stick figures into actual object images.展开更多
The performance and accuracy of computer vision systems are affected by noise in different forms.Although numerous solutions and algorithms have been presented for dealing with every type of noise,a comprehensive tech...The performance and accuracy of computer vision systems are affected by noise in different forms.Although numerous solutions and algorithms have been presented for dealing with every type of noise,a comprehensive technique that can cover all the diverse noises and mitigate their damaging effects on the performance and precision of various systems is still missing.In this paper,we have focused on the stability and robustness of one computer vision branch(i.e.,visual object tracking).We have demonstrated that,without imposing a heavy computational load on a model or changing its algorithms,the drop in the performance and accuracy of a system when it is exposed to an unseen noise-laden test dataset can be prevented by simply applying the style transfer technique on the train dataset and training the model with a combination of these and the original untrained data.To verify our proposed approach,it is applied on a generic object tracker by using regression networks.This method’s validity is confirmed by testing it on an exclusive benchmark comprising 50 image sequences,with each sequence containing 15 types of noise at five different intensity levels.The OPE curves obtained show a 40%increase in the robustness of the proposed object tracker against noise,compared to the other trackers considered.展开更多
The technology for image-to-image style transfer(a prevalent image processing task)has developed rapidly.The purpose of style transfer is to extract a texture from the source image domain and transfer it to the target...The technology for image-to-image style transfer(a prevalent image processing task)has developed rapidly.The purpose of style transfer is to extract a texture from the source image domain and transfer it to the target image domain using a deep neural network.However,the existing methods typically have a large computational cost.To achieve efficient style transfer,we introduce a novel Ghost module into the GANILLA architecture to produce more feature maps from cheap operations.Then we utilize an attention mechanism to transform images with various styles.We optimize the original generative adversarial network(GAN)by using more efficient calculation methods for image-to-illustration translation.The experimental results show that our proposed method is similar to human vision and still maintains the quality of the image.Moreover,our proposed method overcomes the high computational cost and high computational resource consumption for style transfer.By comparing the results of subjective and objective evaluation indicators,our proposed method has shown superior performance over existing methods.展开更多
The complex geometric features of subsurface fractures at different scales makes mesh generation challenging and/or expensive.In this paper,we make use of neural style transfer(NST),a machine learning technique,to gen...The complex geometric features of subsurface fractures at different scales makes mesh generation challenging and/or expensive.In this paper,we make use of neural style transfer(NST),a machine learning technique,to generate mesh from rock fracture images.In this new approach,we use digital rock fractures at multiple scales that represent’content’and define uniformly shaped and sized triangles to represent’style’.The 19-layer convolutional neural network(CNN)learns the content from the rock image,including lower-level features(such as edges and corners)and higher-level features(such as rock,fractures,or other mineral fillings),and learns the style from the triangular grids.By optimizing the cost function to achieve approximation to represent both the content and the style,numerical meshes can be generated and optimized.We utilize the NST to generate meshes for rough fractures with asperities formed in rock,a network of fractures embedded in rock,and a sand aggregate with multiple grains.Based on the examples,we show that this new NST technique can make mesh generation and optimization much more efficient by achieving a good balance between the density of the mesh and the presentation of the geometric features.Finally,we discuss future applications of this approach and perspectives of applying machine learning to bridge the gaps between numerical modeling and experiments.展开更多
Visual illustration transformation from real-world to cartoon images is one of the famous and challenging tasks in computer vision.Image-to-image translation from real-world to cartoon domains poses issues such as a l...Visual illustration transformation from real-world to cartoon images is one of the famous and challenging tasks in computer vision.Image-to-image translation from real-world to cartoon domains poses issues such as a lack of paired training samples,lack of good image translation,low feature extraction from the previous domain images,and lack of high-quality image translation from the traditional generator algorithms.To solve the above-mentioned issues,paired independent model,high-quality dataset,Bayesian-based feature extractor,and an improved generator must be proposed.In this study,we propose a high-quality dataset to reduce the effect of paired training samples on the model’s performance.We use a Bayesian Very Deep Convolutional Network(VGG)-based feature extractor to improve the performance of the standard feature extractor because Bayesian inference regu-larizes weights well.The generator from the Cartoon Generative Adversarial Network(GAN)is modified by introducing a depthwise convolution layer and channel attention mechanism to improve the performance of the original generator.We have used the Fréchet inception distance(FID)score and user preference score to evaluate the performance of the model.The FID scores obtained for the generated cartoon and real-world images are 107 and 76 for the TCC style,and 137 and 57 for the Hayao style,respectively.User preference score is also calculated to evaluate the quality of generated images and our proposed model acquired a high preference score compared to other models.We achieved stunning results in producing high-quality cartoon images,demonstrating the proposed model’s effectiveness in transferring style between authentic images and cartoon images.展开更多
针对服装风格人工分类受主观性、地域等因素影响而造成的分类错误问题,研究了一种基于人工智能的服装风格图像分类方法。首先,在FashionStyle14数据集基础上筛除重复或无效图像,构建服装风格图像数据集;然后,采用迁移学习方法,对Efficie...针对服装风格人工分类受主观性、地域等因素影响而造成的分类错误问题,研究了一种基于人工智能的服装风格图像分类方法。首先,在FashionStyle14数据集基础上筛除重复或无效图像,构建服装风格图像数据集;然后,采用迁移学习方法,对EfficientNet V2、RegNet Y 16GF和ViT Large 16等模型进行微调训练,生成新模型,实现基于单个深度学习的服装风格图像分类;最后,为进一步提高图像分类的准确性、可靠性和鲁棒性,分别采用基于投票、加权平均和堆叠的集成学习方法对上述单个模型进行组合预测。迁移学习实验结果表明,基于ViT Large 16的深度学习模型在测试集上表现最佳,平均准确率为77.024%;集成学习方法实验结果显示,基于投票的集成学习方法在相同测试集上平均准确率可达78.833%。研究结果为解决服装风格分类问题提供了新的思路。展开更多
基金the National Natural Science Foundation of China(Nos.62272478,61872384)Natural Science Foundation of Shanxi Province(No.2023-JC-YB-584)+1 种基金National Natural Science Foundation of China(No.62172436)Engineering University of PAP’s Funding for Scientific Research Innovation Team,Engineering University of PAP’s Funding for Key Researcher(No.KYGG202011).
文摘This paper proposes an artificial intelligence-based robust information hiding algorithm to address the issue of confidential information being susceptible to noise attacks during transmission.The algorithm we designed aims to mitigate the impact of various noise attacks on the integrity of secret information during transmission.The method we propose involves encoding secret images into stylized encrypted images and applies adversarial transfer to both the style and content features of the original and embedded data.This process effectively enhances the concealment and imperceptibility of confidential information,thereby improving the security of such information during transmission and reducing security risks.Furthermore,we have designed a specialized attack layer to simulate real-world attacks and common noise scenarios encountered in practical environments.Through adversarial training,the algorithm is strengthened to enhance its resilience against attacks and overall robustness,ensuring better protection against potential threats.Experimental results demonstrate that our proposed algorithm successfully enhances the concealment and unknowability of secret information while maintaining embedding capacity.Additionally,it ensures the quality and fidelity of the stego image.The method we propose not only improves the security and robustness of information hiding technology but also holds practical application value in protecting sensitive data and ensuring the invisibility of confidential information.
基金supported by Metaverse Lab Program funded by the Ministry of Science and ICT(MSIT),and the Korea Radio Promotion Association(RAPA).
文摘The objective of style transfer is to maintain the content of an image while transferring the style of another image.However,conventional methods face challenges in preserving facial features,especially in Korean portraits where elements like the“Gat”(a traditional Korean hat)are prevalent.This paper proposes a deep learning network designed to perform style transfer that includes the“Gat”while preserving the identity of the face.Unlike traditional style transfer techniques,the proposed method aims to preserve the texture,attire,and the“Gat”in the style image by employing image sharpening and face landmark,with the GAN.The color,texture,and intensity were extracted differently based on the characteristics of each block and layer of the pre-trained VGG-16,and only the necessary elements during training were preserved using a facial landmark mask.The head area was presented using the eyebrow area to transfer the“Gat”.Furthermore,the identity of the face was retained,and style correlation was considered based on the Gram matrix.To evaluate performance,we introduced a metric using PSNR and SSIM,with an emphasis on median values through new weightings for style transfer in Korean portraits.Additionally,we have conducted a survey that evaluated the content,style,and naturalness of the transferred results,and based on the assessment,we can confidently conclude that our method to maintain the integrity of content surpasses the previous research.Our approach,enriched by landmarks preservation and diverse loss functions,including those related to“Gat”,outperformed previous researches in facial identity preservation.
基金support from National Natural Science Foundation of China(62062048).
文摘In recent years,deep generative models have been successfully applied to perform artistic painting style transfer(APST).The difficulties might lie in the loss of reconstructing spatial details and the inefficiency of model convergence caused by the irreversible en-decoder methodology of the existing models.Aiming to this,this paper proposes a Flow-based architecture with both the en-decoder sharing a reversible network configuration.The proposed APST-Flow can efficiently reduce model uncertainty via a compact analysis-synthesis methodology,thereby the generalization performance and the convergence stability are improved.For the generator,a Flow-based network using Wavelet additive coupling(WAC)layers is implemented to extract multi-scale content features.Also,a style checker is used to enhance the global style consistency by minimizing the error between the reconstructed and the input images.To enhance the generated salient details,a loss of adaptive stroke edge is applied in both the global and local model training.The experimental results show that the proposed method improves PSNR by 5%,SSIM by 6.2%,and decreases Style Error by 29.4%over the existing models on the ChipPhi set.The competitive results verify that APST-Flow achieves high-quality generation with less content deviation and enhanced generalization,thereby can be further applied to more APST scenes.
基金funded by the Hanoi University of Science and Technology(HUST)under grant number T2018-PC-210.
文摘In recent years,speech synthesis systems have allowed for the produc-tion of very high-quality voices.Therefore,research in this domain is now turning to the problem of integrating emotions into speech.However,the method of con-structing a speech synthesizer for each emotion has some limitations.First,this method often requires an emotional-speech data set with many sentences.Such data sets are very time-intensive and labor-intensive to complete.Second,training each of these models requires computers with large computational capabilities and a lot of effort and time for model tuning.In addition,each model for each emotion failed to take advantage of data sets of other emotions.In this paper,we propose a new method to synthesize emotional speech in which the latent expressions of emotions are learned from a small data set of professional actors through a Flow-tron model.In addition,we provide a new method to build a speech corpus that is scalable and whose quality is easy to control.Next,to produce a high-quality speech synthesis model,we used this data set to train the Tacotron 2 model.We used it as a pre-trained model to train the Flowtron model.We applied this method to synthesize Vietnamese speech with sadness and happiness.Mean opi-nion score(MOS)assessment results show that MOS is 3.61 for sadness and 3.95 for happiness.In conclusion,the proposed method proves to be more effec-tive for a high degree of automation and fast emotional sentence generation,using a small emotional-speech data set.
基金the National Natural Science Foundation of China(51965008)Science and Technology projects of Guizhou[2018]2168Excellent Young Researcher Project of Guizhou[2017]5630.
文摘With the advent of deep learning,self-driving schemes based on deep learning are becoming more and more popular.Robust perception-action models should learn from data with different scenarios and real behaviors,while current end-to-end model learning is generally limited to training of massive data,innovation of deep network architecture,and learning in-situ model in a simulation environment.Therefore,we introduce a new image style transfer method into data augmentation,and improve the diversity of limited data by changing the texture,contrast ratio and color of the image,and then it is extended to the scenarios that the model has been unobserved before.Inspired by rapid style transfer and artistic style neural algorithms,we propose an arbitrary style generation network architecture,including style transfer network,style learning network,style loss network and multivariate Gaussian distribution function.The style embedding vector is randomly sampled from the multivariate Gaussian distribution and linearly interpolated with the embedded vector predicted by the input image on the style learning network,which provides a set of normalization constants for the style transfer network,and finally realizes the diversity of the image style.In order to verify the effectiveness of the method,image classification and simulation experiments were performed separately.Finally,we built a small-sized smart car experiment platform,and apply the data augmentation technology based on image style transfer drive to the experiment of automatic driving for the first time.The experimental results show that:(1)The proposed scheme can improve the prediction accuracy of the end-to-end model and reduce the model’s error accumulation;(2)the method based on image style transfer provides a new scheme for data augmentation technology,and also provides a solution for the high cost that many deep models rely heavily on a large number of label data.
文摘Aiming at the current process of artistic creation and animation creation, there are a lot of repeated manual operations in the process of conversion from sketch to the stylized image. This paper presented a solution based on a deep learning framework to realize image generation and style transfer. The method first used the conditional generation to resist the network, optimizes the loss function of the training mapping relationship, and generated the actual image from the input sketch. Then, by defining and optimizing the perceptual loss function of the style transfer model, the style features are extracted from the image, thereby forming the actual The conversion between images and stylized art images. Experiments show that this method can greatly reduce the work of coloring and converting with different artistic effects, and achieve the purpose of transforming simple stick figures into actual object images.
文摘The performance and accuracy of computer vision systems are affected by noise in different forms.Although numerous solutions and algorithms have been presented for dealing with every type of noise,a comprehensive technique that can cover all the diverse noises and mitigate their damaging effects on the performance and precision of various systems is still missing.In this paper,we have focused on the stability and robustness of one computer vision branch(i.e.,visual object tracking).We have demonstrated that,without imposing a heavy computational load on a model or changing its algorithms,the drop in the performance and accuracy of a system when it is exposed to an unseen noise-laden test dataset can be prevented by simply applying the style transfer technique on the train dataset and training the model with a combination of these and the original untrained data.To verify our proposed approach,it is applied on a generic object tracker by using regression networks.This method’s validity is confirmed by testing it on an exclusive benchmark comprising 50 image sequences,with each sequence containing 15 types of noise at five different intensity levels.The OPE curves obtained show a 40%increase in the robustness of the proposed object tracker against noise,compared to the other trackers considered.
基金This work was funded by the China Postdoctoral Science Foundation(No.2019M661319)Heilongjiang Postdoctoral Scientific Research Developmental Foundation(No.LBH-Q17042)+1 种基金Fundamental Research Funds for the Central Universities(3072020CFQ0602,3072020CF0604,3072020CFP0601)2019 Industrial Internet Innovation and Development Engineering(KY1060020002,KY10600200008).
文摘The technology for image-to-image style transfer(a prevalent image processing task)has developed rapidly.The purpose of style transfer is to extract a texture from the source image domain and transfer it to the target image domain using a deep neural network.However,the existing methods typically have a large computational cost.To achieve efficient style transfer,we introduce a novel Ghost module into the GANILLA architecture to produce more feature maps from cheap operations.Then we utilize an attention mechanism to transform images with various styles.We optimize the original generative adversarial network(GAN)by using more efficient calculation methods for image-to-illustration translation.The experimental results show that our proposed method is similar to human vision and still maintains the quality of the image.Moreover,our proposed method overcomes the high computational cost and high computational resource consumption for style transfer.By comparing the results of subjective and objective evaluation indicators,our proposed method has shown superior performance over existing methods.
基金supported by Laboratory Directed Research and Development(LDRD)funding from Berkeley Laboratoryby the US Department of Energy(DOE),including the Office of Basic Energy Sciences,Chemical Sciences,Geosciences,and Biosciences Division and the Office of Nuclear Energy,Spent Fuel and Waste Disposition Campaign,both under Contract No.DEAC02-05CH11231 with Berkeley Laboratory。
文摘The complex geometric features of subsurface fractures at different scales makes mesh generation challenging and/or expensive.In this paper,we make use of neural style transfer(NST),a machine learning technique,to generate mesh from rock fracture images.In this new approach,we use digital rock fractures at multiple scales that represent’content’and define uniformly shaped and sized triangles to represent’style’.The 19-layer convolutional neural network(CNN)learns the content from the rock image,including lower-level features(such as edges and corners)and higher-level features(such as rock,fractures,or other mineral fillings),and learns the style from the triangular grids.By optimizing the cost function to achieve approximation to represent both the content and the style,numerical meshes can be generated and optimized.We utilize the NST to generate meshes for rough fractures with asperities formed in rock,a network of fractures embedded in rock,and a sand aggregate with multiple grains.Based on the examples,we show that this new NST technique can make mesh generation and optimization much more efficient by achieving a good balance between the density of the mesh and the presentation of the geometric features.Finally,we discuss future applications of this approach and perspectives of applying machine learning to bridge the gaps between numerical modeling and experiments.
文摘Visual illustration transformation from real-world to cartoon images is one of the famous and challenging tasks in computer vision.Image-to-image translation from real-world to cartoon domains poses issues such as a lack of paired training samples,lack of good image translation,low feature extraction from the previous domain images,and lack of high-quality image translation from the traditional generator algorithms.To solve the above-mentioned issues,paired independent model,high-quality dataset,Bayesian-based feature extractor,and an improved generator must be proposed.In this study,we propose a high-quality dataset to reduce the effect of paired training samples on the model’s performance.We use a Bayesian Very Deep Convolutional Network(VGG)-based feature extractor to improve the performance of the standard feature extractor because Bayesian inference regu-larizes weights well.The generator from the Cartoon Generative Adversarial Network(GAN)is modified by introducing a depthwise convolution layer and channel attention mechanism to improve the performance of the original generator.We have used the Fréchet inception distance(FID)score and user preference score to evaluate the performance of the model.The FID scores obtained for the generated cartoon and real-world images are 107 and 76 for the TCC style,and 137 and 57 for the Hayao style,respectively.User preference score is also calculated to evaluate the quality of generated images and our proposed model acquired a high preference score compared to other models.We achieved stunning results in producing high-quality cartoon images,demonstrating the proposed model’s effectiveness in transferring style between authentic images and cartoon images.
文摘针对服装风格人工分类受主观性、地域等因素影响而造成的分类错误问题,研究了一种基于人工智能的服装风格图像分类方法。首先,在FashionStyle14数据集基础上筛除重复或无效图像,构建服装风格图像数据集;然后,采用迁移学习方法,对EfficientNet V2、RegNet Y 16GF和ViT Large 16等模型进行微调训练,生成新模型,实现基于单个深度学习的服装风格图像分类;最后,为进一步提高图像分类的准确性、可靠性和鲁棒性,分别采用基于投票、加权平均和堆叠的集成学习方法对上述单个模型进行组合预测。迁移学习实验结果表明,基于ViT Large 16的深度学习模型在测试集上表现最佳,平均准确率为77.024%;集成学习方法实验结果显示,基于投票的集成学习方法在相同测试集上平均准确率可达78.833%。研究结果为解决服装风格分类问题提供了新的思路。