Wearable wristband systems leverage deep learning to revolutionize hand gesture recognition in daily activities.Unlike existing approaches that often focus on static gestures and require extensive labeled data,the pro...Wearable wristband systems leverage deep learning to revolutionize hand gesture recognition in daily activities.Unlike existing approaches that often focus on static gestures and require extensive labeled data,the proposed wearable wristband with selfsupervised contrastive learning excels at dynamic motion tracking and adapts rapidly across multiple scenarios.It features a four-channel sensing array composed of an ionic hydrogel with hierarchical microcone structures and ultrathin flexible electrodes,resulting in high-sensitivity capacitance output.Through wireless transmission from a Wi-Fi module,the proposed algorithm learns latent features from the unlabeled signals of random wrist movements.Remarkably,only few-shot labeled data are sufficient for fine-tuning the model,enabling rapid adaptation to various tasks.The system achieves a high accuracy of 94.9%in different scenarios,including the prediction of eight-direction commands,and air-writing of all numbers and letters.The proposed method facilitates smooth transitions between multiple tasks without the need for modifying the structure or undergoing extensive task-specific training.Its utility has been further extended to enhance human–machine interaction over digital platforms,such as game controls,calculators,and three-language login systems,offering users a natural and intuitive way of communication.展开更多
In order to improve the recognition accuracy of similar weather scenarios(SWSs)in terminal area,a recognition model for SWS based on contrastive learning(SWS-CL)is proposed.Firstly,a data augmentation method is design...In order to improve the recognition accuracy of similar weather scenarios(SWSs)in terminal area,a recognition model for SWS based on contrastive learning(SWS-CL)is proposed.Firstly,a data augmentation method is designed to improve the number and quality of weather scenarios samples according to the characteristics of convective weather images.Secondly,in the pre-trained recognition model of SWS-CL,a loss function is formulated to minimize the distance between the anchor and positive samples,and maximize the distance between the anchor and the negative samples in the latent space.Finally,the pre-trained SWS-CL model is fine-tuned with labeled samples to improve the recognition accuracy of SWS.The comparative experiments on the weather images of Guangzhou terminal area show that the proposed data augmentation method can effectively improve the quality of weather image dataset,and the proposed SWS-CL model can achieve satisfactory recognition accuracy.It is also verified that the fine-tuned SWS-CL model has obvious advantages in datasets with sparse labels.展开更多
Previous deep learning-based super-resolution(SR)methods rely on the assumption that the degradation process is predefined(e.g.,bicubic downsampling).Thus,their performance would suffer from deterioration if the real ...Previous deep learning-based super-resolution(SR)methods rely on the assumption that the degradation process is predefined(e.g.,bicubic downsampling).Thus,their performance would suffer from deterioration if the real degradation is not consistent with the assumption.To deal with real-world scenarios,existing blind SR methods are committed to estimating both the degradation and the super-resolved image with an extra loss or iterative scheme.However,degradation estimation that requires more computation would result in limited SR performance due to the accumulated estimation errors.In this paper,we propose a contrastive regularization built upon contrastive learning to exploit both the information of blurry images and clear images as negative and positive samples,respectively.Contrastive regularization ensures that the restored image is pulled closer to the clear image and pushed far away from the blurry image in the representation space.Furthermore,instead of estimating the degradation,we extract global statistical prior information to capture the character of the distortion.Considering the coupling between the degradation and the low-resolution image,we embed the global prior into the distortion-specific SR network to make our method adaptive to the changes of distortions.We term our distortion-specific network with contrastive regularization as CRDNet.The extensive experiments on synthetic and realworld scenes demonstrate that our lightweight CRDNet surpasses state-of-the-art blind super-resolution approaches.展开更多
This paper presents an end-to-end deep learning method to solve geometry problems via feature learning and contrastive learning of multimodal data.A key challenge in solving geometry problems using deep learning is to...This paper presents an end-to-end deep learning method to solve geometry problems via feature learning and contrastive learning of multimodal data.A key challenge in solving geometry problems using deep learning is to automatically adapt to the task of understanding single-modal and multimodal problems.Existing methods either focus on single-modal ormultimodal problems,and they cannot fit each other.A general geometry problem solver shouldobviouslybe able toprocess variousmodalproblems at the same time.Inthispaper,a shared feature-learning model of multimodal data is adopted to learn the unified feature representation of text and image,which can solve the heterogeneity issue between multimodal geometry problems.A contrastive learning model of multimodal data enhances the semantic relevance betweenmultimodal features and maps them into a unified semantic space,which can effectively adapt to both single-modal and multimodal downstream tasks.Based on the feature extraction and fusion of multimodal data,a proposed geometry problem solver uses relation extraction,theorem reasoning,and problem solving to present solutions in a readable way.Experimental results show the effectiveness of the method.展开更多
Some reconstruction-based anomaly detection models in multivariate time series have brought impressive performance advancements but suffer from weak generalization ability and a lack of anomaly identification.These li...Some reconstruction-based anomaly detection models in multivariate time series have brought impressive performance advancements but suffer from weak generalization ability and a lack of anomaly identification.These limitations can result in the misjudgment of models,leading to a degradation in overall detection performance.This paper proposes a novel transformer-like anomaly detection model adopting a contrastive learning module and a memory block(CLME)to overcome the above limitations.The contrastive learning module tailored for time series data can learn the contextual relationships to generate temporal fine-grained representations.The memory block can record normal patterns of these representations through the utilization of attention-based addressing and reintegration mechanisms.These two modules together effectively alleviate the problem of generalization.Furthermore,this paper introduces a fusion anomaly detection strategy that comprehensively takes into account the residual and feature spaces.Such a strategy can enlarge the discrepancies between normal and abnormal data,which is more conducive to anomaly identification.The proposed CLME model not only efficiently enhances the generalization performance but also improves the ability of anomaly detection.To validate the efficacy of the proposed approach,extensive experiments are conducted on well-established benchmark datasets,including SWaT,PSM,WADI,and MSL.The results demonstrate outstanding performance,with F1 scores of 90.58%,94.83%,91.58%,and 91.75%,respectively.These findings affirm the superiority of the CLME model over existing stateof-the-art anomaly detection methodologies in terms of its ability to detect anomalies within complex datasets accurately.展开更多
Multimodal sentiment analysis is an essential area of research in artificial intelligence that combines multiple modes,such as text and image,to accurately assess sentiment.However,conventional approaches that rely on...Multimodal sentiment analysis is an essential area of research in artificial intelligence that combines multiple modes,such as text and image,to accurately assess sentiment.However,conventional approaches that rely on unimodal pre-trained models for feature extraction from each modality often overlook the intrinsic connections of semantic information between modalities.This limitation is attributed to their training on unimodal data,and necessitates the use of complex fusion mechanisms for sentiment analysis.In this study,we present a novel approach that combines a vision-language pre-trained model with a proposed multimodal contrastive learning method.Our approach harnesses the power of transfer learning by utilizing a vision-language pre-trained model to extract both visual and textual representations in a unified framework.We employ a Transformer architecture to integrate these representations,thereby enabling the capture of rich semantic infor-mation in image-text pairs.To further enhance the representation learning of these pairs,we introduce our proposed multimodal contrastive learning method,which leads to improved performance in sentiment analysis tasks.Our approach is evaluated through extensive experiments on two publicly accessible datasets,where we demonstrate its effectiveness.We achieve a significant improvement in sentiment analysis accuracy,indicating the supe-riority of our approach over existing techniques.These results highlight the potential of multimodal sentiment analysis and underscore the importance of considering the intrinsic semantic connections between modalities for accurate sentiment assessment.展开更多
Bundle recommendation aims to provide users with convenient one-stop solutions by recommending bundles of related items that cater to their diverse needs. However, previous research has neglected the interaction betwe...Bundle recommendation aims to provide users with convenient one-stop solutions by recommending bundles of related items that cater to their diverse needs. However, previous research has neglected the interaction between bundle and item views and relied on simplistic methods for predicting user-bundle relationships. To address this limitation, we propose Hybrid Contrastive Learning for Bundle Recommendation (HCLBR). Our approach integrates unsupervised and supervised contrastive learning to enrich user and bundle representations, promoting diversity. By leveraging interconnected views of user-item and user-bundle nodes, HCLBR enhances representation learning for robust recommendations. Evaluation on four public datasets demonstrates the superior performance of HCLBR over state-of-the-art baselines. Our findings highlight the significance of leveraging contrastive learning and interconnected views in bundle recommendation, providing valuable insights for marketing strategies and recommendation system design.展开更多
Unsupervised learning methods such as graph contrastive learning have been used for dynamic graph represen-tation learning to eliminate the dependence of labels.However,existing studies neglect positional information ...Unsupervised learning methods such as graph contrastive learning have been used for dynamic graph represen-tation learning to eliminate the dependence of labels.However,existing studies neglect positional information when learning discrete snapshots,resulting in insufficient network topology learning.At the same time,due to the lack of appropriate data augmentation methods,it is difficult to capture the evolving patterns of the network effectively.To address the above problems,a position-aware and subgraph enhanced dynamic graph contrastive learning method is proposed for discrete-time dynamic graphs.Firstly,the global snapshot is built based on the historical snapshots to express the stable pattern of the dynamic graph,and the random walk is used to obtain the position representation by learning the positional information of the nodes.Secondly,a new data augmentation method is carried out from the perspectives of short-term changes and long-term stable structures of dynamic graphs.Specifically,subgraph sampling based on snapshots and global snapshots is used to obtain two structural augmentation views,and node structures and evolving patterns are learned by combining graph neural network,gated recurrent unit,and attention mechanism.Finally,the quality of node representation is improved by combining the contrastive learning between different structural augmentation views and between the two representations of structure and position.Experimental results on four real datasets show that the performance of the proposed method is better than the existing unsupervised methods,and it is more competitive than the supervised learning method under a semi-supervised setting.展开更多
Open-source licenses can promote the development of machine learning by allowing others to access,modify,and redistribute the training dataset.However,not all open-source licenses may be appropriate for data sharing,a...Open-source licenses can promote the development of machine learning by allowing others to access,modify,and redistribute the training dataset.However,not all open-source licenses may be appropriate for data sharing,as some may not provide adequate protections for sensitive or personal information such as social network data.Additionally,some data may be subject to legal or regulatory restrictions that limit its sharing,regardless of the licensing model used.Hence,obtaining large amounts of labeled data can be difficult,time-consuming,or expensive in many real-world scenarios.Few-shot graph classification,as one application of meta-learning in supervised graph learning,aims to classify unseen graph types by only using a small amount of labeled data.However,the current graph neural network methods lack full usage of graph structures on molecular graphs and social network datasets.Since structural features are known to correlate with molecular properties in chemistry,structure information tends to be ignored with sufficient property information provided.Nevertheless,the common binary classification task of chemical compounds is unsuitable in the few-shot setting requiring novel labels.Hence,this paper focuses on the graph classification tasks of a social network,whose complex topology has an uncertain relationship with its nodes'attributes.With two multi-class graph datasets with large node-attribute dimensions constructed to facilitate the research,we propose a novel learning framework that integrates both meta-learning and contrastive learning to enhance the utilization of graph topological information.Extensive experiments demonstrate the competitive performance of our framework respective to other state-of-the-art methods.展开更多
Recently,self-supervised learning has shown great potential in Graph Neural Networks (GNNs) through contrastive learning,which aims to learn discriminative features for each node without label information. The key to ...Recently,self-supervised learning has shown great potential in Graph Neural Networks (GNNs) through contrastive learning,which aims to learn discriminative features for each node without label information. The key to graph contrastive learning is data augmentation. The anchor node regards its augmented samples as positive samples,and the rest of the samples are regarded as negative samples,some of which may be positive samples. We call these mislabeled samples as “false negative” samples,which will seriously affect the final learning effect. Since such semantically similar samples are ubiquitous in the graph,the problem of false negative samples is very significant. To address this issue,the paper proposes a novel model,False negative sample Detection for Graph Contrastive Learning (FD4GCL),which uses attribute and structure-aware to detect false negative samples. Experimental results on seven datasets show that FD4GCL outperforms the state-of-the-art baselines and even exceeds several supervised methods.展开更多
Contrastive learning,a self-supervised learning method,is widely used in image representation learning.The core idea is to close the distance between positive sample pairs and increase the distance between negative sa...Contrastive learning,a self-supervised learning method,is widely used in image representation learning.The core idea is to close the distance between positive sample pairs and increase the distance between negative sample pairs in the representation space.Siamese networks are the most common structure among various current contrastive learning models.However,contrastive learning using positive and negative sample pairs on large datasets is computationally expensive.In addition,there are cases where positive samples are mislabeled as negative samples.Contrastive learning without negative sample pairs can still learn good representations.In this paper,we propose a simple framework for contrastive learning of image classification(SimCLIC).SimCLIC simplifies the Siamese network and is able to learn the representation of an image without negative sample pairs and momentum encoders.It is mainly by perturbing the image representation generated by the encoder to generate different contrastive views.We apply three representation perturbation methods,namely,history representation,representation dropoput,and representation noise.We conducted experiments on several benchmark datasets to compare with current popular models,using image classification accuracy as a measure,and the results show that our SimCLIC is competitive.Finally,we did ablation experiments to verify the effect of different hyperparameters and structures on the model effectiveness.展开更多
Person re-identification(ReID)aims to recognize the same person in multiple images from different camera views.Training person ReID models are time-consuming and resource-intensive;thus,cloud computing is an appropria...Person re-identification(ReID)aims to recognize the same person in multiple images from different camera views.Training person ReID models are time-consuming and resource-intensive;thus,cloud computing is an appropriate model training solution.However,the required massive personal data for training contain private information with a significant risk of data leakage in cloud environments,leading to significant communication overheads.This paper proposes a federated person ReID method with model-contrastive learning(MOON)in an edge-cloud environment,named FRM.Specifically,based on federated partial averaging,MOON warmup is added to correct the local training of individual edge servers and improve the model’s effectiveness by calculating and back-propagating a model-contrastive loss,which represents the similarity between local and global models.In addition,we propose a lightweight person ReID network,named multi-branch combined depth space network(MB-CDNet),to reduce the computing resource usage of the edge device when training and testing the person ReID model.MB-CDNet is a multi-branch version of combined depth space network(CDNet).We add a part branch and a global branch on the basis of CDNet and introduce an attention pyramid to improve the performance of the model.The experimental results on open-access person ReID datasets demonstrate that FRM achieves better performance than existing baseline.展开更多
Cross-modal image-text retrieval is a fundamental task in bridging vision and language. It faces two main challenges that are typically not well addressed in previous works. 1) Generalizability: Existing methods often...Cross-modal image-text retrieval is a fundamental task in bridging vision and language. It faces two main challenges that are typically not well addressed in previous works. 1) Generalizability: Existing methods often assume a strong semantic correlation between each text-image pair, which are thus difficult to generalize to real-world scenarios where the weak correlation dominates. 2) Efficiency: Many latest works adopt the single-tower architecture with heavy detectors, which are inefficient during the inference stage because the costly computation needs to be repeated for each text-image pair. In this work, to overcome these two challenges, we propose a two-tower cross-modal contrastive learning (CMCL) framework. Specifically, we first devise a two-tower architecture, which enables a unified feature space for the text and image modalities to be directly compared with each other, alleviating the heavy computation during inference. We further introduce a simple yet effective module named multi-grid split (MGS) to learn fine-grained image features without using detectors. Last but not the least, we deploy a cross-modal contrastive loss on the global image/text features to learn their weak correlation and thus achieve high generalizability. To validate that our CMCL can be readily generalized to real-world scenarios, we construct a large multi-source image-text dataset called weak semantic correlation dataset (WSCD). Extensive experiments show that our CMCL outperforms the state-of-the-arts while being much more efficient.展开更多
Good proposal initials are critical for 3D object detection applications.However,due to the significant geometry variation of indoor scenes,incomplete and noisy proposals are inevitable in most cases.Mining feature in...Good proposal initials are critical for 3D object detection applications.However,due to the significant geometry variation of indoor scenes,incomplete and noisy proposals are inevitable in most cases.Mining feature information among these“bad”proposals may mislead the detection.Contrastive learning provides a feasible way for representing proposals,which can align complete and incomplete/noisy proposals in feature space.The aligned feature space can help us build robust 3D representation even if bad proposals are given.Therefore,we devise a new contrast learning framework for indoor 3D object detection,called EFECL,that learns robust 3D representations by contrastive learning of proposals on two different levels.Specifically,we optimize both instance-level and category-level contrasts to align features by capturing instance-specific characteristics and semantic-aware common patterns.Furthermore,we propose an enhanced feature aggregation module to extract more general and informative features for contrastive learning.Evaluations on ScanNet V2 and SUN RGB-D benchmarks demonstrate the generalizability and effectiveness of our method,and our method can achieve 12.3%and 7.3%improvements on both datasets over the benchmark alternatives.The code and models are publicly available at https://github.com/YaraDuan/EFECL.展开更多
Smart manufacturing suffers from the heterogeneity of local data distribution across parties,mutual information silos and lack of privacy protection in the process of industry chain collaboration.To address these prob...Smart manufacturing suffers from the heterogeneity of local data distribution across parties,mutual information silos and lack of privacy protection in the process of industry chain collaboration.To address these problems,we propose a federated domain adaptation algorithm based on knowledge distillation and contrastive learning.Knowledge distillation is used to extract transferable integration knowledge from the different source domains and the quality of the extracted integration knowledge is used to assign reasonable weights to each source domain.A more rational weighted average aggregation is used in the aggregation phase of the center server to optimize the global model,while the local model of the source domain is trained with the help of contrastive learning to constrain the local model optimum towards the global model optimum,mitigating the inherent heterogeneity between local data.Our experiments are conducted on the largest domain adaptation dataset,and the results show that compared with other traditional federated domain adaptation algorithms,the algorithm we proposed trains a more accurate model,requires fewer communication rounds,makes more effective use of imbalanced data in the industrial area,and protects data privacy.展开更多
Height map estimation from a single aerial image plays a crucial role in localization,mapping,and 3D object detection.Deep convolutional neural networks have been used to predict height information from single-view re...Height map estimation from a single aerial image plays a crucial role in localization,mapping,and 3D object detection.Deep convolutional neural networks have been used to predict height information from single-view remote sensing images,but these methods rely on large volumes of training data and often overlook geometric features present in orthographic images.To address these issues,this study proposes a gradient-based self-supervised learning network with momentum contrastive loss to extract geometric information from non-labeled images in the pretraining stage.Additionally,novel local implicit constraint layers are used at multiple decoding stages in the proposed supervised network to refine high-resolution features in height estimation.The structural-aware loss is also applied to improve the robustness of the network to positional shift and minor structural changes along the boundary area.Experimental evaluation on the ISPRS benchmark datasets shows that the proposed method outperforms other baseline networks,with minimum MAE and RMSE of 0.116 and 0.289 for the Vaihingen dataset and 0.077 and 0.481 for the Potsdam dataset,respectively.The proposed method also shows around threefold data efficiency improvements on the Potsdam dataset and domain generalization on the Enschede datasets.These results demonstrate the effectiveness of the proposed method in height map estimation from single-view remote sensing images.展开更多
With the rapid growth of information retrieval technology,Chinese text classification,which is the basis of information content security,has become a widely discussed topic.In view of the huge difference compared with...With the rapid growth of information retrieval technology,Chinese text classification,which is the basis of information content security,has become a widely discussed topic.In view of the huge difference compared with English,Chinese text task is more complex in semantic information representations.However,most existing Chinese text classification approaches typically regard feature representation and feature selection as the key points,but fail to take into account the learning strategy that adapts to the task.Besides,these approaches compress the Chinese word into a representation vector,without considering the distribution of the term among the categories of interest.In order to improve the effect of Chinese text classification,a unified method,called Supervised Contrastive Learning with Term Weighting(SCL-TW),is proposed in this paper.Supervised contrastive learning makes full use of a large amount of unlabeled data to improve model stability.In SCL-TW,we calculate the score of term weighting to optimize the process of data augmentation of Chinese text.Subsequently,the transformed features are fed into a temporal convolution network to conduct feature representation.Experimental verifications are conducted on two Chinese benchmark datasets.The results demonstrate that SCL-TW outperforms other advanced Chinese text classification approaches by an amazing margin.展开更多
Current studies in few-shot semantic segmentation mostly utilize meta-learning frameworks to obtain models that can be generalized to new categories.However,these models trained on base classes with sufficient annotat...Current studies in few-shot semantic segmentation mostly utilize meta-learning frameworks to obtain models that can be generalized to new categories.However,these models trained on base classes with sufficient annotated samples are biased towards these base classes,which results in semantic confusion and ambiguity between base classes and new classes.A strategy is to use an additional base learner to recognize the objects of base classes and then refine the prediction results output by the meta learner.In this way,the interaction between these two learners and the way of combining results from the two learners are important.This paper proposes a new model,namely Distilling Base and Meta(DBAM)network by using self-attention mechanism and contrastive learning to enhance the few-shot segmentation performance.First,the self-attention-based ensemble module(SEM)is proposed to produce a more accurate adjustment factor for improving the fusion of two predictions of the two learners.Second,the prototype feature optimization module(PFOM)is proposed to provide an interaction between the two learners,which enhances the ability to distinguish the base classes from the target class by introducing contrastive learning loss.Extensive experiments have demonstrated that our method improves on the PASCAL-5i under 1-shot and 5-shot settings,respectively.展开更多
Motor imagery(MI)based Brain-computer interfaces(BCIs)have a wide range of applications in the stroke rehabilitation field.However,due to the low signal-to-noise ratio and high cross-subject variation of the electroen...Motor imagery(MI)based Brain-computer interfaces(BCIs)have a wide range of applications in the stroke rehabilitation field.However,due to the low signal-to-noise ratio and high cross-subject variation of the electroencephalogram(EEG)signals generated by motor imagery,the classification performance of the existing methods still needs to be improved to meet the need of real practice.To overcome this problem,we propose a multi-scale spatial-temporal convolutional neural network called MSCNet.We introduce the contrastive learning into a multi-temporal convolution scale backbone to further improve the robustness and discrimination of embedding vectors.Experimental results of binary classification show that MSCNet outperforms the state-of-theart methods,achieving accuracy improvement of 6.04%,3.98%,and 8.15%on BCIC IV 2a,SMR-BCI,and OpenBMI datasets in subject-dependent manner,respectively.The results show that the contrastive learning method can significantly improve the classification accuracy of motor imagery EEG signals,which provides an important reference for the design of motor imagery classification algorithms.展开更多
基金supported by the Research Grant Fund from Kwangwoon University in 2023,the National Natural Science Foundation of China under Grant(62311540155)the Taishan Scholars Project Special Funds(tsqn202312035)the open research foundation of State Key Laboratory of Integrated Chips and Systems.
文摘Wearable wristband systems leverage deep learning to revolutionize hand gesture recognition in daily activities.Unlike existing approaches that often focus on static gestures and require extensive labeled data,the proposed wearable wristband with selfsupervised contrastive learning excels at dynamic motion tracking and adapts rapidly across multiple scenarios.It features a four-channel sensing array composed of an ionic hydrogel with hierarchical microcone structures and ultrathin flexible electrodes,resulting in high-sensitivity capacitance output.Through wireless transmission from a Wi-Fi module,the proposed algorithm learns latent features from the unlabeled signals of random wrist movements.Remarkably,only few-shot labeled data are sufficient for fine-tuning the model,enabling rapid adaptation to various tasks.The system achieves a high accuracy of 94.9%in different scenarios,including the prediction of eight-direction commands,and air-writing of all numbers and letters.The proposed method facilitates smooth transitions between multiple tasks without the need for modifying the structure or undergoing extensive task-specific training.Its utility has been further extended to enhance human–machine interaction over digital platforms,such as game controls,calculators,and three-language login systems,offering users a natural and intuitive way of communication.
基金supported by the Fundamental Research Funds for the Central Universities(NOS.NS2019054,NS2020045)。
文摘In order to improve the recognition accuracy of similar weather scenarios(SWSs)in terminal area,a recognition model for SWS based on contrastive learning(SWS-CL)is proposed.Firstly,a data augmentation method is designed to improve the number and quality of weather scenarios samples according to the characteristics of convective weather images.Secondly,in the pre-trained recognition model of SWS-CL,a loss function is formulated to minimize the distance between the anchor and positive samples,and maximize the distance between the anchor and the negative samples in the latent space.Finally,the pre-trained SWS-CL model is fine-tuned with labeled samples to improve the recognition accuracy of SWS.The comparative experiments on the weather images of Guangzhou terminal area show that the proposed data augmentation method can effectively improve the quality of weather image dataset,and the proposed SWS-CL model can achieve satisfactory recognition accuracy.It is also verified that the fine-tuned SWS-CL model has obvious advantages in datasets with sparse labels.
基金supported by the National Natural Science Foundation of China(61971165)the Key Research and Development Program of Hubei Province(2020BAB113)。
文摘Previous deep learning-based super-resolution(SR)methods rely on the assumption that the degradation process is predefined(e.g.,bicubic downsampling).Thus,their performance would suffer from deterioration if the real degradation is not consistent with the assumption.To deal with real-world scenarios,existing blind SR methods are committed to estimating both the degradation and the super-resolved image with an extra loss or iterative scheme.However,degradation estimation that requires more computation would result in limited SR performance due to the accumulated estimation errors.In this paper,we propose a contrastive regularization built upon contrastive learning to exploit both the information of blurry images and clear images as negative and positive samples,respectively.Contrastive regularization ensures that the restored image is pulled closer to the clear image and pushed far away from the blurry image in the representation space.Furthermore,instead of estimating the degradation,we extract global statistical prior information to capture the character of the distortion.Considering the coupling between the degradation and the low-resolution image,we embed the global prior into the distortion-specific SR network to make our method adaptive to the changes of distortions.We term our distortion-specific network with contrastive regularization as CRDNet.The extensive experiments on synthetic and realworld scenes demonstrate that our lightweight CRDNet surpasses state-of-the-art blind super-resolution approaches.
基金supported by the NationalNatural Science Foundation of China (No.62107014,Jian P.,62177025,He B.)the Key R&D and Promotion Projects of Henan Province (No.212102210147,Jian P.)Innovative Education Program for Graduate Students at North China University of Water Resources and Electric Power,China (No.YK-2021-99,Guo F.).
文摘This paper presents an end-to-end deep learning method to solve geometry problems via feature learning and contrastive learning of multimodal data.A key challenge in solving geometry problems using deep learning is to automatically adapt to the task of understanding single-modal and multimodal problems.Existing methods either focus on single-modal ormultimodal problems,and they cannot fit each other.A general geometry problem solver shouldobviouslybe able toprocess variousmodalproblems at the same time.Inthispaper,a shared feature-learning model of multimodal data is adopted to learn the unified feature representation of text and image,which can solve the heterogeneity issue between multimodal geometry problems.A contrastive learning model of multimodal data enhances the semantic relevance betweenmultimodal features and maps them into a unified semantic space,which can effectively adapt to both single-modal and multimodal downstream tasks.Based on the feature extraction and fusion of multimodal data,a proposed geometry problem solver uses relation extraction,theorem reasoning,and problem solving to present solutions in a readable way.Experimental results show the effectiveness of the method.
基金support from the Major National Science and Technology Special Projects(2016ZX02301003-004-007)the Natural Science Foundation of Hebei Province(F2020202067)。
文摘Some reconstruction-based anomaly detection models in multivariate time series have brought impressive performance advancements but suffer from weak generalization ability and a lack of anomaly identification.These limitations can result in the misjudgment of models,leading to a degradation in overall detection performance.This paper proposes a novel transformer-like anomaly detection model adopting a contrastive learning module and a memory block(CLME)to overcome the above limitations.The contrastive learning module tailored for time series data can learn the contextual relationships to generate temporal fine-grained representations.The memory block can record normal patterns of these representations through the utilization of attention-based addressing and reintegration mechanisms.These two modules together effectively alleviate the problem of generalization.Furthermore,this paper introduces a fusion anomaly detection strategy that comprehensively takes into account the residual and feature spaces.Such a strategy can enlarge the discrepancies between normal and abnormal data,which is more conducive to anomaly identification.The proposed CLME model not only efficiently enhances the generalization performance but also improves the ability of anomaly detection.To validate the efficacy of the proposed approach,extensive experiments are conducted on well-established benchmark datasets,including SWaT,PSM,WADI,and MSL.The results demonstrate outstanding performance,with F1 scores of 90.58%,94.83%,91.58%,and 91.75%,respectively.These findings affirm the superiority of the CLME model over existing stateof-the-art anomaly detection methodologies in terms of its ability to detect anomalies within complex datasets accurately.
基金supported by Science and Technology Research Project of Jiangxi Education Department.Project Grant No.GJJ2203306.
文摘Multimodal sentiment analysis is an essential area of research in artificial intelligence that combines multiple modes,such as text and image,to accurately assess sentiment.However,conventional approaches that rely on unimodal pre-trained models for feature extraction from each modality often overlook the intrinsic connections of semantic information between modalities.This limitation is attributed to their training on unimodal data,and necessitates the use of complex fusion mechanisms for sentiment analysis.In this study,we present a novel approach that combines a vision-language pre-trained model with a proposed multimodal contrastive learning method.Our approach harnesses the power of transfer learning by utilizing a vision-language pre-trained model to extract both visual and textual representations in a unified framework.We employ a Transformer architecture to integrate these representations,thereby enabling the capture of rich semantic infor-mation in image-text pairs.To further enhance the representation learning of these pairs,we introduce our proposed multimodal contrastive learning method,which leads to improved performance in sentiment analysis tasks.Our approach is evaluated through extensive experiments on two publicly accessible datasets,where we demonstrate its effectiveness.We achieve a significant improvement in sentiment analysis accuracy,indicating the supe-riority of our approach over existing techniques.These results highlight the potential of multimodal sentiment analysis and underscore the importance of considering the intrinsic semantic connections between modalities for accurate sentiment assessment.
文摘Bundle recommendation aims to provide users with convenient one-stop solutions by recommending bundles of related items that cater to their diverse needs. However, previous research has neglected the interaction between bundle and item views and relied on simplistic methods for predicting user-bundle relationships. To address this limitation, we propose Hybrid Contrastive Learning for Bundle Recommendation (HCLBR). Our approach integrates unsupervised and supervised contrastive learning to enrich user and bundle representations, promoting diversity. By leveraging interconnected views of user-item and user-bundle nodes, HCLBR enhances representation learning for robust recommendations. Evaluation on four public datasets demonstrates the superior performance of HCLBR over state-of-the-art baselines. Our findings highlight the significance of leveraging contrastive learning and interconnected views in bundle recommendation, providing valuable insights for marketing strategies and recommendation system design.
文摘Unsupervised learning methods such as graph contrastive learning have been used for dynamic graph represen-tation learning to eliminate the dependence of labels.However,existing studies neglect positional information when learning discrete snapshots,resulting in insufficient network topology learning.At the same time,due to the lack of appropriate data augmentation methods,it is difficult to capture the evolving patterns of the network effectively.To address the above problems,a position-aware and subgraph enhanced dynamic graph contrastive learning method is proposed for discrete-time dynamic graphs.Firstly,the global snapshot is built based on the historical snapshots to express the stable pattern of the dynamic graph,and the random walk is used to obtain the position representation by learning the positional information of the nodes.Secondly,a new data augmentation method is carried out from the perspectives of short-term changes and long-term stable structures of dynamic graphs.Specifically,subgraph sampling based on snapshots and global snapshots is used to obtain two structural augmentation views,and node structures and evolving patterns are learned by combining graph neural network,gated recurrent unit,and attention mechanism.Finally,the quality of node representation is improved by combining the contrastive learning between different structural augmentation views and between the two representations of structure and position.Experimental results on four real datasets show that the performance of the proposed method is better than the existing unsupervised methods,and it is more competitive than the supervised learning method under a semi-supervised setting.
基金supported by SW Copyright Ecosystem R&D Program through the Korea Creative Content Agency grant funded by the Ministry of Culture,Sports,and Tourism in 2023(No.RS-2023-00224818).
文摘Open-source licenses can promote the development of machine learning by allowing others to access,modify,and redistribute the training dataset.However,not all open-source licenses may be appropriate for data sharing,as some may not provide adequate protections for sensitive or personal information such as social network data.Additionally,some data may be subject to legal or regulatory restrictions that limit its sharing,regardless of the licensing model used.Hence,obtaining large amounts of labeled data can be difficult,time-consuming,or expensive in many real-world scenarios.Few-shot graph classification,as one application of meta-learning in supervised graph learning,aims to classify unseen graph types by only using a small amount of labeled data.However,the current graph neural network methods lack full usage of graph structures on molecular graphs and social network datasets.Since structural features are known to correlate with molecular properties in chemistry,structure information tends to be ignored with sufficient property information provided.Nevertheless,the common binary classification task of chemical compounds is unsuitable in the few-shot setting requiring novel labels.Hence,this paper focuses on the graph classification tasks of a social network,whose complex topology has an uncertain relationship with its nodes'attributes.With two multi-class graph datasets with large node-attribute dimensions constructed to facilitate the research,we propose a novel learning framework that integrates both meta-learning and contrastive learning to enhance the utilization of graph topological information.Extensive experiments demonstrate the competitive performance of our framework respective to other state-of-the-art methods.
基金supported by the National Key Research and Development Program of China(No.2021YFB3300503)Regional Innovation and Development Joint Fund of National Natural Science Foundation of China(No.U22A20167)National Natural Science Foundation of China(No.61872260).
文摘Recently,self-supervised learning has shown great potential in Graph Neural Networks (GNNs) through contrastive learning,which aims to learn discriminative features for each node without label information. The key to graph contrastive learning is data augmentation. The anchor node regards its augmented samples as positive samples,and the rest of the samples are regarded as negative samples,some of which may be positive samples. We call these mislabeled samples as “false negative” samples,which will seriously affect the final learning effect. Since such semantically similar samples are ubiquitous in the graph,the problem of false negative samples is very significant. To address this issue,the paper proposes a novel model,False negative sample Detection for Graph Contrastive Learning (FD4GCL),which uses attribute and structure-aware to detect false negative samples. Experimental results on seven datasets show that FD4GCL outperforms the state-of-the-art baselines and even exceeds several supervised methods.
文摘Contrastive learning,a self-supervised learning method,is widely used in image representation learning.The core idea is to close the distance between positive sample pairs and increase the distance between negative sample pairs in the representation space.Siamese networks are the most common structure among various current contrastive learning models.However,contrastive learning using positive and negative sample pairs on large datasets is computationally expensive.In addition,there are cases where positive samples are mislabeled as negative samples.Contrastive learning without negative sample pairs can still learn good representations.In this paper,we propose a simple framework for contrastive learning of image classification(SimCLIC).SimCLIC simplifies the Siamese network and is able to learn the representation of an image without negative sample pairs and momentum encoders.It is mainly by perturbing the image representation generated by the encoder to generate different contrastive views.We apply three representation perturbation methods,namely,history representation,representation dropoput,and representation noise.We conducted experiments on several benchmark datasets to compare with current popular models,using image classification accuracy as a measure,and the results show that our SimCLIC is competitive.Finally,we did ablation experiments to verify the effect of different hyperparameters and structures on the model effectiveness.
基金supported by the the Natural Science Foundation of Jiangsu Province of China under Grant No.BK20211284the Financial and Science Technology Plan Project of Xinjiang Production and Construction Corps under Grant No.2020DB005.
文摘Person re-identification(ReID)aims to recognize the same person in multiple images from different camera views.Training person ReID models are time-consuming and resource-intensive;thus,cloud computing is an appropriate model training solution.However,the required massive personal data for training contain private information with a significant risk of data leakage in cloud environments,leading to significant communication overheads.This paper proposes a federated person ReID method with model-contrastive learning(MOON)in an edge-cloud environment,named FRM.Specifically,based on federated partial averaging,MOON warmup is added to correct the local training of individual edge servers and improve the model’s effectiveness by calculating and back-propagating a model-contrastive loss,which represents the similarity between local and global models.In addition,we propose a lightweight person ReID network,named multi-branch combined depth space network(MB-CDNet),to reduce the computing resource usage of the edge device when training and testing the person ReID model.MB-CDNet is a multi-branch version of combined depth space network(CDNet).We add a part branch and a global branch on the basis of CDNet and introduce an attention pyramid to improve the performance of the model.The experimental results on open-access person ReID datasets demonstrate that FRM achieves better performance than existing baseline.
文摘Cross-modal image-text retrieval is a fundamental task in bridging vision and language. It faces two main challenges that are typically not well addressed in previous works. 1) Generalizability: Existing methods often assume a strong semantic correlation between each text-image pair, which are thus difficult to generalize to real-world scenarios where the weak correlation dominates. 2) Efficiency: Many latest works adopt the single-tower architecture with heavy detectors, which are inefficient during the inference stage because the costly computation needs to be repeated for each text-image pair. In this work, to overcome these two challenges, we propose a two-tower cross-modal contrastive learning (CMCL) framework. Specifically, we first devise a two-tower architecture, which enables a unified feature space for the text and image modalities to be directly compared with each other, alleviating the heavy computation during inference. We further introduce a simple yet effective module named multi-grid split (MGS) to learn fine-grained image features without using detectors. Last but not the least, we deploy a cross-modal contrastive loss on the global image/text features to learn their weak correlation and thus achieve high generalizability. To validate that our CMCL can be readily generalized to real-world scenarios, we construct a large multi-source image-text dataset called weak semantic correlation dataset (WSCD). Extensive experiments show that our CMCL outperforms the state-of-the-arts while being much more efficient.
基金This work is supported in part by the National Key R&D Program of China(2018AAA0102200)National Natural Science Foundation of China(62002375,62002376,62132021)+1 种基金Natural Science Foundation of Hunan Province of China(2021RC3071,2022RC1104,2021JJ40696)NUDT Research Grants(ZK22-52).
文摘Good proposal initials are critical for 3D object detection applications.However,due to the significant geometry variation of indoor scenes,incomplete and noisy proposals are inevitable in most cases.Mining feature information among these“bad”proposals may mislead the detection.Contrastive learning provides a feasible way for representing proposals,which can align complete and incomplete/noisy proposals in feature space.The aligned feature space can help us build robust 3D representation even if bad proposals are given.Therefore,we devise a new contrast learning framework for indoor 3D object detection,called EFECL,that learns robust 3D representations by contrastive learning of proposals on two different levels.Specifically,we optimize both instance-level and category-level contrasts to align features by capturing instance-specific characteristics and semantic-aware common patterns.Furthermore,we propose an enhanced feature aggregation module to extract more general and informative features for contrastive learning.Evaluations on ScanNet V2 and SUN RGB-D benchmarks demonstrate the generalizability and effectiveness of our method,and our method can achieve 12.3%and 7.3%improvements on both datasets over the benchmark alternatives.The code and models are publicly available at https://github.com/YaraDuan/EFECL.
基金Supported by the Scientific and Technological Innovation 2030—Major Project of"New Generation Artificial Intelligence"(2020AAA0109300)。
文摘Smart manufacturing suffers from the heterogeneity of local data distribution across parties,mutual information silos and lack of privacy protection in the process of industry chain collaboration.To address these problems,we propose a federated domain adaptation algorithm based on knowledge distillation and contrastive learning.Knowledge distillation is used to extract transferable integration knowledge from the different source domains and the quality of the extracted integration knowledge is used to assign reasonable weights to each source domain.A more rational weighted average aggregation is used in the aggregation phase of the center server to optimize the global model,while the local model of the source domain is trained with the help of contrastive learning to constrain the local model optimum towards the global model optimum,mitigating the inherent heterogeneity between local data.Our experiments are conducted on the largest domain adaptation dataset,and the results show that compared with other traditional federated domain adaptation algorithms,the algorithm we proposed trains a more accurate model,requires fewer communication rounds,makes more effective use of imbalanced data in the industrial area,and protects data privacy.
基金supported by National Natural Science Foundation of China[grant number 42001329,42001283]Guangdong Basic and Applied Basic Research Foundation[grant number 2023A1515011718]+1 种基金China Postdoctoral Science Foundation[grant number 2021M701268]Foundation of Anhui Province Key Laboratory of Physical Geographic Environment,P.R.China[grant number 2022PGE012].
文摘Height map estimation from a single aerial image plays a crucial role in localization,mapping,and 3D object detection.Deep convolutional neural networks have been used to predict height information from single-view remote sensing images,but these methods rely on large volumes of training data and often overlook geometric features present in orthographic images.To address these issues,this study proposes a gradient-based self-supervised learning network with momentum contrastive loss to extract geometric information from non-labeled images in the pretraining stage.Additionally,novel local implicit constraint layers are used at multiple decoding stages in the proposed supervised network to refine high-resolution features in height estimation.The structural-aware loss is also applied to improve the robustness of the network to positional shift and minor structural changes along the boundary area.Experimental evaluation on the ISPRS benchmark datasets shows that the proposed method outperforms other baseline networks,with minimum MAE and RMSE of 0.116 and 0.289 for the Vaihingen dataset and 0.077 and 0.481 for the Potsdam dataset,respectively.The proposed method also shows around threefold data efficiency improvements on the Potsdam dataset and domain generalization on the Enschede datasets.These results demonstrate the effectiveness of the proposed method in height map estimation from single-view remote sensing images.
基金supported by the National Natural Science Foundation of China (No.U1936122)Primary Research&Developement Plan of Hubei Province (Nos.2020BAB101 and 2020BAA003).
文摘With the rapid growth of information retrieval technology,Chinese text classification,which is the basis of information content security,has become a widely discussed topic.In view of the huge difference compared with English,Chinese text task is more complex in semantic information representations.However,most existing Chinese text classification approaches typically regard feature representation and feature selection as the key points,but fail to take into account the learning strategy that adapts to the task.Besides,these approaches compress the Chinese word into a representation vector,without considering the distribution of the term among the categories of interest.In order to improve the effect of Chinese text classification,a unified method,called Supervised Contrastive Learning with Term Weighting(SCL-TW),is proposed in this paper.Supervised contrastive learning makes full use of a large amount of unlabeled data to improve model stability.In SCL-TW,we calculate the score of term weighting to optimize the process of data augmentation of Chinese text.Subsequently,the transformed features are fed into a temporal convolution network to conduct feature representation.Experimental verifications are conducted on two Chinese benchmark datasets.The results demonstrate that SCL-TW outperforms other advanced Chinese text classification approaches by an amazing margin.
文摘Current studies in few-shot semantic segmentation mostly utilize meta-learning frameworks to obtain models that can be generalized to new categories.However,these models trained on base classes with sufficient annotated samples are biased towards these base classes,which results in semantic confusion and ambiguity between base classes and new classes.A strategy is to use an additional base learner to recognize the objects of base classes and then refine the prediction results output by the meta learner.In this way,the interaction between these two learners and the way of combining results from the two learners are important.This paper proposes a new model,namely Distilling Base and Meta(DBAM)network by using self-attention mechanism and contrastive learning to enhance the few-shot segmentation performance.First,the self-attention-based ensemble module(SEM)is proposed to produce a more accurate adjustment factor for improving the fusion of two predictions of the two learners.Second,the prototype feature optimization module(PFOM)is proposed to provide an interaction between the two learners,which enhances the ability to distinguish the base classes from the target class by introducing contrastive learning loss.Extensive experiments have demonstrated that our method improves on the PASCAL-5i under 1-shot and 5-shot settings,respectively.
基金support from the National Key Research and Development Program of China(Grant No.2018YFC1312903)Beijing Natural Science Foundation(Grant No.Z200016)the Fundamental Research Funds for the Central Universities(Grant No.KG16137101,KG16187001 and KG16123001).
文摘Motor imagery(MI)based Brain-computer interfaces(BCIs)have a wide range of applications in the stroke rehabilitation field.However,due to the low signal-to-noise ratio and high cross-subject variation of the electroencephalogram(EEG)signals generated by motor imagery,the classification performance of the existing methods still needs to be improved to meet the need of real practice.To overcome this problem,we propose a multi-scale spatial-temporal convolutional neural network called MSCNet.We introduce the contrastive learning into a multi-temporal convolution scale backbone to further improve the robustness and discrimination of embedding vectors.Experimental results of binary classification show that MSCNet outperforms the state-of-theart methods,achieving accuracy improvement of 6.04%,3.98%,and 8.15%on BCIC IV 2a,SMR-BCI,and OpenBMI datasets in subject-dependent manner,respectively.The results show that the contrastive learning method can significantly improve the classification accuracy of motor imagery EEG signals,which provides an important reference for the design of motor imagery classification algorithms.
基金supported in part by the National Natural Science Foundation of China under Grant 62162068,Grant 61761049,Grant 61540062 and Grant 62061049in part by the Yunnan Province Ten Thousand Talents Program and Yunling Scholars Special Project under Grant YNWR-YLXZ-2018-022in part by the Yunnan Provincial Science and Technology Department-Yunnan University“Double First Class”Construction Joint Fund Project under Grant No.2019FY003012.