Research on reconstructing imperfect faces is a challenging task.In this study,we explore a data-driven approach using a pre-trained MICA(MetrIC fAce)model combined with 3D printing to address this challenge.We propos...Research on reconstructing imperfect faces is a challenging task.In this study,we explore a data-driven approach using a pre-trained MICA(MetrIC fAce)model combined with 3D printing to address this challenge.We propose a training strategy that utilizes the pre-trained MICA model and self-supervised learning techniques to improve accuracy and reduce the time needed for 3D facial structure reconstruction.Our results demonstrate high accuracy,evaluated by the geometric loss function and various statistical measures.To showcase the effectiveness of the approach,we used 3D printing to create a model that covers facial wounds.The findings indicate that our method produces a model that fits well and achieves comprehensive 3D facial reconstruction.This technique has the potential to aid doctors in treating patients with facial injuries.展开更多
Two-dimensional endoscopic images are susceptible to interferences such as specular reflections and monotonous texture illumination,hindering accurate three-dimensional lesion reconstruction by surgical robots.This st...Two-dimensional endoscopic images are susceptible to interferences such as specular reflections and monotonous texture illumination,hindering accurate three-dimensional lesion reconstruction by surgical robots.This study proposes a novel end-to-end disparity estimation model to address these challenges.Our approach combines a Pseudo-Siamese neural network architecture with pyramid dilated convolutions,integrating multi-scale image information to enhance robustness against lighting interferences.This study introduces a Pseudo-Siamese structure-based disparity regression model that simplifies left-right image comparison,improving accuracy and efficiency.The model was evaluated using a dataset of stereo endoscopic videos captured by the Da Vinci surgical robot,comprising simulated silicone heart sequences and real heart video data.Experimental results demonstrate significant improvement in the network’s resistance to lighting interference without substantially increasing parameters.Moreover,the model exhibited faster convergence during training,contributing to overall performance enhancement.This study advances endoscopic image processing accuracy and has potential implications for surgical robot applications in complex environments.展开更多
Although neural approaches have yielded state-of-the-art results in the sentence matching task,their perfor-mance inevitably drops dramatically when applied to unseen domains.To tackle this cross-domain challenge,we a...Although neural approaches have yielded state-of-the-art results in the sentence matching task,their perfor-mance inevitably drops dramatically when applied to unseen domains.To tackle this cross-domain challenge,we address unsupervised domain adaptation on sentence matching,in which the goal is to have good performance on a target domain with only unlabeled target domain data as well as labeled source domain data.Specifically,we propose to perform self-su-pervised tasks to achieve it.Different from previous unsupervised domain adaptation methods,self-supervision can not on-ly flexibly suit the characteristics of sentence matching with a special design,but also be much easier to optimize.When training,each self-supervised task is performed on both domains simultaneously in an easy-to-hard curriculum,which gradually brings the two domains closer together along the direction relevant to the task.As a result,the classifier trained on the source domain is able to generalize to the unlabeled target domain.In total,we present three types of self-super-vised tasks and the results demonstrate their superiority.In addition,we further study the performance of different usages of self-supervised tasks,which would inspire how to effectively utilize self-supervision for cross-domain scenarios.展开更多
The federated self-supervised framework is a distributed machine learning method that combines federated learning and self-supervised learning, which can effectively solve the problem of traditional federated learning...The federated self-supervised framework is a distributed machine learning method that combines federated learning and self-supervised learning, which can effectively solve the problem of traditional federated learning being difficult to process large-scale unlabeled data. The existing federated self-supervision framework has problems with low communication efficiency and high communication delay between clients and central servers. Therefore, we added edge servers to the federated self-supervision framework to reduce the pressure on the central server caused by frequent communication between both ends. A communication compression scheme using gradient quantization and sparsification was proposed to optimize the communication of the entire framework, and the algorithm of the sparse communication compression module was improved. Experiments have proved that the learning rate changes of the improved sparse communication compression module are smoother and more stable. Our communication compression scheme effectively reduced the overall communication overhead.展开更多
Accurate aging diagnosis is crucial for the health and safety management of lithium-ion batteries in electric vehicles.Despite significant advancements achieved by data-driven methods,diagnosis accuracy remains constr...Accurate aging diagnosis is crucial for the health and safety management of lithium-ion batteries in electric vehicles.Despite significant advancements achieved by data-driven methods,diagnosis accuracy remains constrained by the high costs of check-up tests and the scarcity of labeled data.This paper presents a framework utilizing self-supervised machine learning to harness the potential of unlabeled data for diagnosing battery aging in electric vehicles during field operations.We validate our method using battery degradation datasets collected over more than two years from twenty real-world electric vehicles.Our analysis comprehensively addresses cell inconsistencies,physical interpretations,and charging uncertainties in real-world applications.This is achieved through self-supervised feature extraction using random short charging sequences in the main peak of incremental capacity curves.By leveraging inexpensive unlabeled data in a self-supervised approach,our method demonstrates improvements in average root mean square errors of 74.54%and 60.50%in the best and worst cases,respectively,compared to the supervised benchmark.This work underscores the potential of employing low-cost unlabeled data with self-supervised machine learning for effective battery health and safety management in realworld scenarios.展开更多
Low-light images suffer from low quality due to poor lighting conditions,noise pollution,and improper settings of cameras.To enhance low-light images,most existing methods rely on normal-light images for guidance but ...Low-light images suffer from low quality due to poor lighting conditions,noise pollution,and improper settings of cameras.To enhance low-light images,most existing methods rely on normal-light images for guidance but the collection of suitable normal-light images is difficult.In contrast,a self-supervised method breaks free from the reliance on normal-light data,resulting in more convenience and better generalization.Existing self-supervised methods primarily focus on illumination adjustment and design pixel-based adjustment methods,resulting in remnants of other degradations,uneven brightness and artifacts.In response,this paper proposes a self-supervised enhancement method,termed as SLIE.It can handle multiple degradations including illumination attenuation,noise pollution,and color shift,all in a self-supervised manner.Illumination attenuation is estimated based on physical principles and local neighborhood information.The removal and correction of noise and color shift removal are solely realized with noisy images and images with color shifts.Finally,the comprehensive and fully self-supervised approach can achieve better adaptability and generalization.It is applicable to various low light conditions,and can reproduce the original color of scenes in natural light.Extensive experiments conducted on four public datasets demonstrate the superiority of SLIE to thirteen state-of-the-art methods.Our code is available at https://github.com/hanna-xu/SLIE.展开更多
The encoding aperture snapshot spectral imaging system,based on the compressive sensing theory,can be regarded as an encoder,which can efficiently obtain compressed two-dimensional spectral data and then de⁃code it in...The encoding aperture snapshot spectral imaging system,based on the compressive sensing theory,can be regarded as an encoder,which can efficiently obtain compressed two-dimensional spectral data and then de⁃code it into three-dimensional spectral data through deep neural networks.However,training the deep neural net⁃works requires a large amount of clean data that is difficult to obtain.To address the problem of insufficient train⁃ing data for deep neural networks,a self-supervised hyperspectral denoising neural network based on neighbor⁃hood sampling is proposed.This network is integrated into a deep plug-and-play framework to achieve self-super⁃vised spectral reconstruction.The study also examines the impact of different noise degradation models on the fi⁃nal reconstruction quality.Experimental results demonstrate that the self-supervised learning method enhances the average peak signal-to-noise ratio by 1.18 dB and improves the structural similarity by 0.009 compared with the supervised learning method.Additionally,it achieves better visual reconstruction results.展开更多
Wearable wristband systems leverage deep learning to revolutionize hand gesture recognition in daily activities.Unlike existing approaches that often focus on static gestures and require extensive labeled data,the pro...Wearable wristband systems leverage deep learning to revolutionize hand gesture recognition in daily activities.Unlike existing approaches that often focus on static gestures and require extensive labeled data,the proposed wearable wristband with selfsupervised contrastive learning excels at dynamic motion tracking and adapts rapidly across multiple scenarios.It features a four-channel sensing array composed of an ionic hydrogel with hierarchical microcone structures and ultrathin flexible electrodes,resulting in high-sensitivity capacitance output.Through wireless transmission from a Wi-Fi module,the proposed algorithm learns latent features from the unlabeled signals of random wrist movements.Remarkably,only few-shot labeled data are sufficient for fine-tuning the model,enabling rapid adaptation to various tasks.The system achieves a high accuracy of 94.9%in different scenarios,including the prediction of eight-direction commands,and air-writing of all numbers and letters.The proposed method facilitates smooth transitions between multiple tasks without the need for modifying the structure or undergoing extensive task-specific training.Its utility has been further extended to enhance human–machine interaction over digital platforms,such as game controls,calculators,and three-language login systems,offering users a natural and intuitive way of communication.展开更多
State of health(SoH) estimation plays a key role in smart battery health prognostic and management.However,poor generalization,lack of labeled data,and unused measurements during aging are still major challenges to ac...State of health(SoH) estimation plays a key role in smart battery health prognostic and management.However,poor generalization,lack of labeled data,and unused measurements during aging are still major challenges to accurate SoH estimation.Toward this end,this paper proposes a self-supervised learning framework to boost the performance of battery SoH estimation.Different from traditional data-driven methods which rely on a considerable training dataset obtained from numerous battery cells,the proposed method achieves accurate and robust estimations using limited labeled data.A filter-based data preprocessing technique,which enables the extraction of partial capacity-voltage curves under dynamic charging profiles,is applied at first.Unsupervised learning is then used to learn the aging characteristics from the unlabeled data through an auto-encoder-decoder.The learned network parameters are transferred to the downstream SoH estimation task and are fine-tuned with very few sparsely labeled data,which boosts the performance of the estimation framework.The proposed method has been validated under different battery chemistries,formats,operating conditions,and ambient.The estimation accuracy can be guaranteed by using only three labeled data from the initial 20% life cycles,with overall errors less than 1.14% and error distribution of all testing scenarios maintaining less than 4%,and robustness increases with aging.Comparisons with other pure supervised machine learning methods demonstrate the superiority of the proposed method.This simple and data-efficient estimation framework is promising in real-world applications under a variety of scenarios.展开更多
Noise suppression is an essential step in many seismic processing workflows.A portion of this noise,particularly in land datasets,presents itself as random noise.In recent years,neural networks have been successfully ...Noise suppression is an essential step in many seismic processing workflows.A portion of this noise,particularly in land datasets,presents itself as random noise.In recent years,neural networks have been successfully used to denoise seismic data in a supervised fashion.However,supervised learning always comes with the often unachievable requirement of having noisy-clean data pairs for training.Using blind-spot networks,we redefine the denoising task as a self-supervised procedure where the network uses the surrounding noisy samples to estimate the noise-free value of a central sample.Based on the assumption that noise is statistically independent between samples,the network struggles to predict the noise component of the sample due to its randomicity,whilst the signal component is accurately predicted due to its spatio-temporal coherency.Illustrated on synthetic examples,the blind-spot network is shown to be an efficient denoiser of seismic data contaminated by random noise with minimal damage to the signal;therefore,providing improvements in both the image domain and down-the-line tasks,such as post-stack inversion.To conclude our study,the suggested approach is applied to field data and the results are compared with two commonly used random denoising techniques:FX-deconvolution and sparsity-promoting inversion by Curvelet transform.By demonstrating that blind-spot networks are an efficient suppressor of random noise,we believe this is just the beginning of utilising self-supervised learning in seismic applications.展开更多
In recent years,self-supervised learning which does not require a large number of manual labels generate supervised signals through the data itself to attain the characterization learning of samples.Self-supervised le...In recent years,self-supervised learning which does not require a large number of manual labels generate supervised signals through the data itself to attain the characterization learning of samples.Self-supervised learning solves the problem of learning semantic features from unlabeled data,and realizes pre-training of models in large data sets.Its significant advantages have been extensively studied by scholars in recent years.There are usually three types of self-supervised learning:"Generative,Contrastive,and GeneTative-Contrastive."The model of the comparative learning method is relatively simple,and the performance of the current downstream task is comparable to that of the supervised learning method.Therefore,we propose a conceptual analysis framework:data augmentation pipeline,architectures,pretext tasks,comparison methods,semisupervised fine-tuning.Based on this conceptual framework,we qualitatively analyze the existing comparative self-supervised learning methods for computer vision,and then further analyze its performance at different stages,and finally summarize the research status of sei supervised comparative learning methods in other fields.展开更多
The effectiveness of facial expression recognition(FER)algorithms hinges on the model’s quality and the availability of a substantial amount of labeled expression data.However,labeling large datasets demands signific...The effectiveness of facial expression recognition(FER)algorithms hinges on the model’s quality and the availability of a substantial amount of labeled expression data.However,labeling large datasets demands significant human,time,and financial resources.Although active learning methods have mitigated the dependency on extensive labeled data,a cold-start problem persists in small to medium-sized expression recognition datasets.This issue arises because the initial labeled data often fails to represent the full spectrum of facial expression characteristics.This paper introduces an active learning approach that integrates uncertainty estimation,aiming to improve the precision of facial expression recognition regardless of dataset scale variations.The method is divided into two primary phases.First,the model undergoes self-supervised pre-training using contrastive learning and uncertainty estimation to bolster its feature extraction capabilities.Second,the model is fine-tuned using the prior knowledge obtained from the pre-training phase to significantly improve recognition accuracy.In the pretraining phase,the model employs contrastive learning to extract fundamental feature representations from the complete unlabeled dataset.These features are then weighted through a self-attention mechanism with rank regularization.Subsequently,data from the low-weighted set is relabeled to further refine the model’s feature extraction ability.The pre-trained model is then utilized in active learning to select and label information-rich samples more efficiently.Experimental results demonstrate that the proposed method significantly outperforms existing approaches,achieving an improvement in recognition accuracy of 5.09%and 3.82%over the best existing active learning methods,Margin,and Least Confidence methods,respectively,and a 1.61%improvement compared to the conventional segmented active learning method.展开更多
Eye diagnosis is a method for inspecting systemic diseases and syndromes by observing the eyes.With the development of intelligent diagnosis in traditional Chinese medicine(TCM);artificial intelligence(AI)can improve ...Eye diagnosis is a method for inspecting systemic diseases and syndromes by observing the eyes.With the development of intelligent diagnosis in traditional Chinese medicine(TCM);artificial intelligence(AI)can improve the accuracy and efficiency of eye diagnosis.However;the research on intelligent eye diagnosis still faces many challenges;including the lack of standardized and precisely labeled data;multi-modal information analysis;and artificial in-telligence models for syndrome differentiation.The widespread application of AI models in medicine provides new insights and opportunities for the research of eye diagnosis intelli-gence.This study elaborates on the three key technologies of AI models in the intelligent ap-plication of TCM eye diagnosis;and explores the implications for the research of eye diagno-sis intelligence.First;a database concerning eye diagnosis was established based on self-su-pervised learning so as to solve the issues related to the lack of standardized and precisely la-beled data.Next;the cross-modal understanding and generation of deep neural network models to address the problem of lacking multi-modal information analysis.Last;the build-ing of data-driven models for eye diagnosis to tackle the issue of the absence of syndrome dif-ferentiation models.In summary;research on intelligent eye diagnosis has great potential to be applied the surge of AI model applications.展开更多
Microfossil classification is an important discipline in subsurface exploration,for both oil&gas and Carbon Capture and Storage(CCS).The abundance and distribution of species found in sedimentary rocks provide val...Microfossil classification is an important discipline in subsurface exploration,for both oil&gas and Carbon Capture and Storage(CCS).The abundance and distribution of species found in sedimentary rocks provide valuable information about the age and depositional environment.However,the analysis is difficult and consuming,time-as it is based on manual work by human experts.Attempts to automate this process face two key challenges:(1)the input data are very large-our dataset is projected to grow to 3 billion microfossils,and(2)there are not enough labeled data to use the standard procedure of training a deep learning classifier.We propose an efficient pipeline for processing and grouping fossils by genus,or even species,from microscope slides using self-supervised learning.First we show how to efficiently extract crops from whole slide images by adapting previously trained object detection algorithms.Second,we provide a comparison of a range of self-supervised learning methods to classify and identify microfossils from very few labels.We obtain excellent results with both convolutional neural networks and vision transformers fine-tuned by self-supervision.Our approach is fast and computationally light,providing a handy tool for geologists working with microfossils.展开更多
Broad and safe access to ultrafast laser technology has been hindered by the absence of optical fiber-delivered pulses with tunable central wavelength,pulse repetition rate,and pulse width in the picosecond–femtoseco...Broad and safe access to ultrafast laser technology has been hindered by the absence of optical fiber-delivered pulses with tunable central wavelength,pulse repetition rate,and pulse width in the picosecond–femtosecond regime.To address this long-standing obstacle,we developed a reliable accessory for femtosecond ytterbium fiber chirped pulse amplifiers,termed a fiber-optic nonlinear wavelength converter(FNWC),as an adaptive optical source for the emergent field of femtosecond biophotonics.This accessory empowers the fixed-wavelength laser to produce fiber-delivered∼20 nJ pulses with central wavelength across 950 to 1150 nm,repetition rate across 1 to 10 MHz,and pulse width across 40 to 400 fs,with a long-term stability of>2000 h.As a prototypical label-free application in biology and medicine,we demonstrate the utility of FNWC in real-time intravital imaging synergistically integrated with modern machine learning and largescale fluorescence lifetime imaging microscopy.展开更多
Spectroscopy,especially for plasma spectroscopy,provides a powerful platform for biological and material analysis with its elemental and molecular fingerprinting capability.Artificial intelligence(AI)has the tremendou...Spectroscopy,especially for plasma spectroscopy,provides a powerful platform for biological and material analysis with its elemental and molecular fingerprinting capability.Artificial intelligence(AI)has the tremendous potential to build a universal quantitative framework covering all branches of plasma spectroscopy based on its unmatched representation and generalization ability.Herein,we introduce an AI-based unified method called self-supervised image-spectrum twin information fusion detection(SISTIFD)to collect twin co-occurrence signals of the plasma and to intelligently predict the physical parameters for improving the performances of all plasma spectroscopic techniques.It can fuse the spectra and plasma images in synchronization,derive the plasma parameters(total number density,plasma temperature,electron density,and other implicit factors),and provide accurate results.The experimental data demonstrate their excellent utility and capacity,with a reduction of 98%in evaluation indices(root mean square error,relative standard deviation,etc.)and an analysis frequency of 143 Hz(much faster than the mainstream detection frame rate of 1 Hz).In addition,as a completely end-to-end and self-supervised framework,the SISTIFD enables automatic detection without manual preprocessing or intervention.With these advantages,it has remarkably enhanced various plasma spectroscopic techniques with state-of-the-art performance and unsealed their possibility in industry,especially in the regions that require both capability and efficiency.This scheme brings new inspiration to the whole field of plasma spectroscopy and enables in situ analysis with a real-world scenario of high throughput,cross-interference,various analyte complexity,and diverse applications.展开更多
Massive Multiple-Input-Multiple-Output(MIMO)is a promising technology to meet the demand for the connection of massive devices and high data capacity for mobile networks in the next generation communication system.How...Massive Multiple-Input-Multiple-Output(MIMO)is a promising technology to meet the demand for the connection of massive devices and high data capacity for mobile networks in the next generation communication system.However,due to the massive connectivity of mobile devices,the pilot contamination problem will severely degrade the communication quality and spectrum efficiency of the massive MIMO system.We propose a deep Monte Carlo Tree Search(MCTS)-based intelligent Pilot-power Allocation Scheme(iPAS)to address this issue.The core of iPAS is a multi-task deep reinforcement learning algorithm that can automatically learn the radio environment and make decisions on the pilot sequence and power allocation to maximize the spectrum efficiency with self-play training.To accelerate the searching convergence,we introduce a Deep Neural Network(DNN)to predict the pilot sequence and power allocation actions.The DNN is trained in a self-supervised learning manner,where the training data is generated from the searching process of the MCTS algorithm.Numerical results show that our proposed iPAS achieves a better Cumulative Distribution Function(CDF)of the ergodic spectral efficiency compared with the previous suboptimal algorithms.展开更多
Speech emotion recognition,as an important component of humancomputer interaction technology,has received increasing attention.Recent studies have treated emotion recognition of speech signals as a multimodal task,due...Speech emotion recognition,as an important component of humancomputer interaction technology,has received increasing attention.Recent studies have treated emotion recognition of speech signals as a multimodal task,due to its inclusion of the semantic features of two different modalities,i.e.,audio and text.However,existing methods often fail in effectively represent features and capture correlations.This paper presents a multi-level circulant cross-modal Transformer(MLCCT)formultimodal speech emotion recognition.The proposed model can be divided into three steps,feature extraction,interaction and fusion.Self-supervised embedding models are introduced for feature extraction,which give a more powerful representation of the original data than those using spectrograms or audio features such as Mel-frequency cepstral coefficients(MFCCs)and low-level descriptors(LLDs).In particular,MLCCT contains two types of feature interaction processes,where a bidirectional Long Short-term Memory(Bi-LSTM)with circulant interaction mechanism is proposed for low-level features,while a two-stream residual cross-modal Transformer block is appliedwhen high-level features are involved.Finally,we choose self-attention blocks for fusion and a fully connected layer to make predictions.To evaluate the performance of our proposed model,comprehensive experiments are conducted on three widely used benchmark datasets including IEMOCAP,MELD and CMU-MOSEI.The competitive results verify the effectiveness of our approach.展开更多
In scientific and industrial research, three-dimensional (3D) imaging, or depth measurement, is a critical tool that provides detailed insight into surface properties. Confocal microscopy, known for its precision in s...In scientific and industrial research, three-dimensional (3D) imaging, or depth measurement, is a critical tool that provides detailed insight into surface properties. Confocal microscopy, known for its precision in surface measurements, plays a key role in this field. However, 3D imaging based on confocal microscopy is often challenged by significant data requirements and slow measurement speeds. In this paper, we present a novel self-supervised learning algorithm called SSL Depth that overcomes these challenges. Specifically, our method exploits the feature learning capabilities of neural networks while avoiding the need for labeled data sets typically associated with supervised learning approaches. Through practical demonstrations on a commercially available confocal microscope, we find that our method not only maintains higher quality, but also significantly reduces the frequency of the z-axis sampling required for 3D imaging. This reduction results in a remarkable 16×measurement speed, with the potential for further acceleration in the future. Our methodological advance enables highly efficient and accurate 3D surface reconstructions, thereby expanding the potential applications of confocal microscopy in various scientific and industrial fields.展开更多
Starting from late 2019,the new coronavirus disease(COVID-19)has become a global crisis.With the development of online social media,people prefer to express their opinions and discuss the latest news online.We have wi...Starting from late 2019,the new coronavirus disease(COVID-19)has become a global crisis.With the development of online social media,people prefer to express their opinions and discuss the latest news online.We have witnessed the positive influence of online social media,which helped citizens and governments track the development of this pandemic in time.It is necessary to apply artificial intelligence(AI)techniques to online social media and automatically discover and track public opinions posted online.In this paper,we take Sina Weibo,the most widely used online social media in China,for analysis and experiments.We collect multi-modal microblogs about COVID-19 from 2020/1/1 to 2020/3/31 with a web crawler,including texts and images posted by users.In order to effectively discover what is being discussed about COVID-19 without human labeling,we propose a unified multi-modal framework,including an unsupervised short-text topic model to discover and track bursty topics,and a self-supervised model to learn image features so that we can retrieve related images about COVID-19.Experimental results have shown the effectiveness and superiority of the proposed models,and also have shown the considerable application prospects for analyzing and tracking public opinions about COVID-19.展开更多
文摘Research on reconstructing imperfect faces is a challenging task.In this study,we explore a data-driven approach using a pre-trained MICA(MetrIC fAce)model combined with 3D printing to address this challenge.We propose a training strategy that utilizes the pre-trained MICA model and self-supervised learning techniques to improve accuracy and reduce the time needed for 3D facial structure reconstruction.Our results demonstrate high accuracy,evaluated by the geometric loss function and various statistical measures.To showcase the effectiveness of the approach,we used 3D printing to create a model that covers facial wounds.The findings indicate that our method produces a model that fits well and achieves comprehensive 3D facial reconstruction.This technique has the potential to aid doctors in treating patients with facial injuries.
基金Supported by Sichuan Science and Technology Program(2023YFSY0026,2023YFH0004)Supported by the Institute of Information&Communications Technology Planning&Evaluation(IITP)grant funded by the Korean government(MSIT)(No.RS-2022-00155885,Artificial Intelligence Convergence Innovation Human Resources Development(Hanyang University ERICA)).
文摘Two-dimensional endoscopic images are susceptible to interferences such as specular reflections and monotonous texture illumination,hindering accurate three-dimensional lesion reconstruction by surgical robots.This study proposes a novel end-to-end disparity estimation model to address these challenges.Our approach combines a Pseudo-Siamese neural network architecture with pyramid dilated convolutions,integrating multi-scale image information to enhance robustness against lighting interferences.This study introduces a Pseudo-Siamese structure-based disparity regression model that simplifies left-right image comparison,improving accuracy and efficiency.The model was evaluated using a dataset of stereo endoscopic videos captured by the Da Vinci surgical robot,comprising simulated silicone heart sequences and real heart video data.Experimental results demonstrate significant improvement in the network’s resistance to lighting interference without substantially increasing parameters.Moreover,the model exhibited faster convergence during training,contributing to overall performance enhancement.This study advances endoscopic image processing accuracy and has potential implications for surgical robot applications in complex environments.
基金supported by the National Natural Science Foundation of China under Grant Nos.61922085 and 61976211the National Key Research and Development Program of China under Grant No.2020AAA0106400+2 种基金the Key Research Program of the Chinese Academy of Sciences under Grant No.ZDBS-SSW-JSC006the Independent Research Project of the National Laboratory of Pattern Recognition under Grant No.Z-2018013the Youth Innovation Promotion Association of Chinese Academy of Sciences under Grant No.2020138.
文摘Although neural approaches have yielded state-of-the-art results in the sentence matching task,their perfor-mance inevitably drops dramatically when applied to unseen domains.To tackle this cross-domain challenge,we address unsupervised domain adaptation on sentence matching,in which the goal is to have good performance on a target domain with only unlabeled target domain data as well as labeled source domain data.Specifically,we propose to perform self-su-pervised tasks to achieve it.Different from previous unsupervised domain adaptation methods,self-supervision can not on-ly flexibly suit the characteristics of sentence matching with a special design,but also be much easier to optimize.When training,each self-supervised task is performed on both domains simultaneously in an easy-to-hard curriculum,which gradually brings the two domains closer together along the direction relevant to the task.As a result,the classifier trained on the source domain is able to generalize to the unlabeled target domain.In total,we present three types of self-super-vised tasks and the results demonstrate their superiority.In addition,we further study the performance of different usages of self-supervised tasks,which would inspire how to effectively utilize self-supervision for cross-domain scenarios.
文摘The federated self-supervised framework is a distributed machine learning method that combines federated learning and self-supervised learning, which can effectively solve the problem of traditional federated learning being difficult to process large-scale unlabeled data. The existing federated self-supervision framework has problems with low communication efficiency and high communication delay between clients and central servers. Therefore, we added edge servers to the federated self-supervision framework to reduce the pressure on the central server caused by frequent communication between both ends. A communication compression scheme using gradient quantization and sparsification was proposed to optimize the communication of the entire framework, and the algorithm of the sparse communication compression module was improved. Experiments have proved that the learning rate changes of the improved sparse communication compression module are smoother and more stable. Our communication compression scheme effectively reduced the overall communication overhead.
基金supported by the research project‘‘SafeDaBatt”(03EMF0409A)funded by the German Federal Ministry for Digital and Transport(BMDV)+2 种基金the National Key Research and Development Program of China(2022YFE0102700)the Key Research and Development Program of Shaanxi Province(2023-GHYB-05,2023-YBSF-104)the financial support from the China Scholarship Council(CSC)(202206567008)。
文摘Accurate aging diagnosis is crucial for the health and safety management of lithium-ion batteries in electric vehicles.Despite significant advancements achieved by data-driven methods,diagnosis accuracy remains constrained by the high costs of check-up tests and the scarcity of labeled data.This paper presents a framework utilizing self-supervised machine learning to harness the potential of unlabeled data for diagnosing battery aging in electric vehicles during field operations.We validate our method using battery degradation datasets collected over more than two years from twenty real-world electric vehicles.Our analysis comprehensively addresses cell inconsistencies,physical interpretations,and charging uncertainties in real-world applications.This is achieved through self-supervised feature extraction using random short charging sequences in the main peak of incremental capacity curves.By leveraging inexpensive unlabeled data in a self-supervised approach,our method demonstrates improvements in average root mean square errors of 74.54%and 60.50%in the best and worst cases,respectively,compared to the supervised benchmark.This work underscores the potential of employing low-cost unlabeled data with self-supervised machine learning for effective battery health and safety management in realworld scenarios.
基金supported by the National Natural Science Foundation of China(62276192)。
文摘Low-light images suffer from low quality due to poor lighting conditions,noise pollution,and improper settings of cameras.To enhance low-light images,most existing methods rely on normal-light images for guidance but the collection of suitable normal-light images is difficult.In contrast,a self-supervised method breaks free from the reliance on normal-light data,resulting in more convenience and better generalization.Existing self-supervised methods primarily focus on illumination adjustment and design pixel-based adjustment methods,resulting in remnants of other degradations,uneven brightness and artifacts.In response,this paper proposes a self-supervised enhancement method,termed as SLIE.It can handle multiple degradations including illumination attenuation,noise pollution,and color shift,all in a self-supervised manner.Illumination attenuation is estimated based on physical principles and local neighborhood information.The removal and correction of noise and color shift removal are solely realized with noisy images and images with color shifts.Finally,the comprehensive and fully self-supervised approach can achieve better adaptability and generalization.It is applicable to various low light conditions,and can reproduce the original color of scenes in natural light.Extensive experiments conducted on four public datasets demonstrate the superiority of SLIE to thirteen state-of-the-art methods.Our code is available at https://github.com/hanna-xu/SLIE.
基金Supported by the Zhejiang Provincial"Jianbing"and"Lingyan"R&D Programs(2023C03012,2024C01126)。
文摘The encoding aperture snapshot spectral imaging system,based on the compressive sensing theory,can be regarded as an encoder,which can efficiently obtain compressed two-dimensional spectral data and then de⁃code it into three-dimensional spectral data through deep neural networks.However,training the deep neural net⁃works requires a large amount of clean data that is difficult to obtain.To address the problem of insufficient train⁃ing data for deep neural networks,a self-supervised hyperspectral denoising neural network based on neighbor⁃hood sampling is proposed.This network is integrated into a deep plug-and-play framework to achieve self-super⁃vised spectral reconstruction.The study also examines the impact of different noise degradation models on the fi⁃nal reconstruction quality.Experimental results demonstrate that the self-supervised learning method enhances the average peak signal-to-noise ratio by 1.18 dB and improves the structural similarity by 0.009 compared with the supervised learning method.Additionally,it achieves better visual reconstruction results.
基金supported by the Research Grant Fund from Kwangwoon University in 2023,the National Natural Science Foundation of China under Grant(62311540155)the Taishan Scholars Project Special Funds(tsqn202312035)the open research foundation of State Key Laboratory of Integrated Chips and Systems.
文摘Wearable wristband systems leverage deep learning to revolutionize hand gesture recognition in daily activities.Unlike existing approaches that often focus on static gestures and require extensive labeled data,the proposed wearable wristband with selfsupervised contrastive learning excels at dynamic motion tracking and adapts rapidly across multiple scenarios.It features a four-channel sensing array composed of an ionic hydrogel with hierarchical microcone structures and ultrathin flexible electrodes,resulting in high-sensitivity capacitance output.Through wireless transmission from a Wi-Fi module,the proposed algorithm learns latent features from the unlabeled signals of random wrist movements.Remarkably,only few-shot labeled data are sufficient for fine-tuning the model,enabling rapid adaptation to various tasks.The system achieves a high accuracy of 94.9%in different scenarios,including the prediction of eight-direction commands,and air-writing of all numbers and letters.The proposed method facilitates smooth transitions between multiple tasks without the need for modifying the structure or undergoing extensive task-specific training.Its utility has been further extended to enhance human–machine interaction over digital platforms,such as game controls,calculators,and three-language login systems,offering users a natural and intuitive way of communication.
基金funded by the “SMART BATTERY” project, granted by Villum Foundation in 2021 (project number 222860)。
文摘State of health(SoH) estimation plays a key role in smart battery health prognostic and management.However,poor generalization,lack of labeled data,and unused measurements during aging are still major challenges to accurate SoH estimation.Toward this end,this paper proposes a self-supervised learning framework to boost the performance of battery SoH estimation.Different from traditional data-driven methods which rely on a considerable training dataset obtained from numerous battery cells,the proposed method achieves accurate and robust estimations using limited labeled data.A filter-based data preprocessing technique,which enables the extraction of partial capacity-voltage curves under dynamic charging profiles,is applied at first.Unsupervised learning is then used to learn the aging characteristics from the unlabeled data through an auto-encoder-decoder.The learned network parameters are transferred to the downstream SoH estimation task and are fine-tuned with very few sparsely labeled data,which boosts the performance of the estimation framework.The proposed method has been validated under different battery chemistries,formats,operating conditions,and ambient.The estimation accuracy can be guaranteed by using only three labeled data from the initial 20% life cycles,with overall errors less than 1.14% and error distribution of all testing scenarios maintaining less than 4%,and robustness increases with aging.Comparisons with other pure supervised machine learning methods demonstrate the superiority of the proposed method.This simple and data-efficient estimation framework is promising in real-world applications under a variety of scenarios.
文摘Noise suppression is an essential step in many seismic processing workflows.A portion of this noise,particularly in land datasets,presents itself as random noise.In recent years,neural networks have been successfully used to denoise seismic data in a supervised fashion.However,supervised learning always comes with the often unachievable requirement of having noisy-clean data pairs for training.Using blind-spot networks,we redefine the denoising task as a self-supervised procedure where the network uses the surrounding noisy samples to estimate the noise-free value of a central sample.Based on the assumption that noise is statistically independent between samples,the network struggles to predict the noise component of the sample due to its randomicity,whilst the signal component is accurately predicted due to its spatio-temporal coherency.Illustrated on synthetic examples,the blind-spot network is shown to be an efficient denoiser of seismic data contaminated by random noise with minimal damage to the signal;therefore,providing improvements in both the image domain and down-the-line tasks,such as post-stack inversion.To conclude our study,the suggested approach is applied to field data and the results are compared with two commonly used random denoising techniques:FX-deconvolution and sparsity-promoting inversion by Curvelet transform.By demonstrating that blind-spot networks are an efficient suppressor of random noise,we believe this is just the beginning of utilising self-supervised learning in seismic applications.
文摘In recent years,self-supervised learning which does not require a large number of manual labels generate supervised signals through the data itself to attain the characterization learning of samples.Self-supervised learning solves the problem of learning semantic features from unlabeled data,and realizes pre-training of models in large data sets.Its significant advantages have been extensively studied by scholars in recent years.There are usually three types of self-supervised learning:"Generative,Contrastive,and GeneTative-Contrastive."The model of the comparative learning method is relatively simple,and the performance of the current downstream task is comparable to that of the supervised learning method.Therefore,we propose a conceptual analysis framework:data augmentation pipeline,architectures,pretext tasks,comparison methods,semisupervised fine-tuning.Based on this conceptual framework,we qualitatively analyze the existing comparative self-supervised learning methods for computer vision,and then further analyze its performance at different stages,and finally summarize the research status of sei supervised comparative learning methods in other fields.
基金supported by National Science Foundation of China(61971078)Chongqing Municipal Education Commission Science and Technology Major Project(KJZDM202301901).
文摘The effectiveness of facial expression recognition(FER)algorithms hinges on the model’s quality and the availability of a substantial amount of labeled expression data.However,labeling large datasets demands significant human,time,and financial resources.Although active learning methods have mitigated the dependency on extensive labeled data,a cold-start problem persists in small to medium-sized expression recognition datasets.This issue arises because the initial labeled data often fails to represent the full spectrum of facial expression characteristics.This paper introduces an active learning approach that integrates uncertainty estimation,aiming to improve the precision of facial expression recognition regardless of dataset scale variations.The method is divided into two primary phases.First,the model undergoes self-supervised pre-training using contrastive learning and uncertainty estimation to bolster its feature extraction capabilities.Second,the model is fine-tuned using the prior knowledge obtained from the pre-training phase to significantly improve recognition accuracy.In the pretraining phase,the model employs contrastive learning to extract fundamental feature representations from the complete unlabeled dataset.These features are then weighted through a self-attention mechanism with rank regularization.Subsequently,data from the low-weighted set is relabeled to further refine the model’s feature extraction ability.The pre-trained model is then utilized in active learning to select and label information-rich samples more efficiently.Experimental results demonstrate that the proposed method significantly outperforms existing approaches,achieving an improvement in recognition accuracy of 5.09%and 3.82%over the best existing active learning methods,Margin,and Least Confidence methods,respectively,and a 1.61%improvement compared to the conventional segmented active learning method.
基金National Natural Science Foundation of China(82274265 and 82274588)Hunan University of Traditional Chinese Medicine Research Unveiled Marshal Programs(2022XJJB003).
文摘Eye diagnosis is a method for inspecting systemic diseases and syndromes by observing the eyes.With the development of intelligent diagnosis in traditional Chinese medicine(TCM);artificial intelligence(AI)can improve the accuracy and efficiency of eye diagnosis.However;the research on intelligent eye diagnosis still faces many challenges;including the lack of standardized and precisely labeled data;multi-modal information analysis;and artificial in-telligence models for syndrome differentiation.The widespread application of AI models in medicine provides new insights and opportunities for the research of eye diagnosis intelli-gence.This study elaborates on the three key technologies of AI models in the intelligent ap-plication of TCM eye diagnosis;and explores the implications for the research of eye diagno-sis intelligence.First;a database concerning eye diagnosis was established based on self-su-pervised learning so as to solve the issues related to the lack of standardized and precisely la-beled data.Next;the cross-modal understanding and generation of deep neural network models to address the problem of lacking multi-modal information analysis.Last;the build-ing of data-driven models for eye diagnosis to tackle the issue of the absence of syndrome dif-ferentiation models.In summary;research on intelligent eye diagnosis has great potential to be applied the surge of AI model applications.
基金supported by the Research Council of Norway,through its Centre for Research-based Innovation funding scheme (grant no.309439),and Consortium Partners.
文摘Microfossil classification is an important discipline in subsurface exploration,for both oil&gas and Carbon Capture and Storage(CCS).The abundance and distribution of species found in sedimentary rocks provide valuable information about the age and depositional environment.However,the analysis is difficult and consuming,time-as it is based on manual work by human experts.Attempts to automate this process face two key challenges:(1)the input data are very large-our dataset is projected to grow to 3 billion microfossils,and(2)there are not enough labeled data to use the standard procedure of training a deep learning classifier.We propose an efficient pipeline for processing and grouping fossils by genus,or even species,from microscope slides using self-supervised learning.First we show how to efficiently extract crops from whole slide images by adapting previously trained object detection algorithms.Second,we provide a comparison of a range of self-supervised learning methods to classify and identify microfossils from very few labels.We obtain excellent results with both convolutional neural networks and vision transformers fine-tuned by self-supervision.Our approach is fast and computationally light,providing a handy tool for geologists working with microfossils.
基金support from the National Institutes of Health,U.S.Department of Health and Human Services(Grant No.R01 CA241618)J.E.S.and R.R.I were supported by NIBIB/NIH(Award No.T32EB019944).
文摘Broad and safe access to ultrafast laser technology has been hindered by the absence of optical fiber-delivered pulses with tunable central wavelength,pulse repetition rate,and pulse width in the picosecond–femtosecond regime.To address this long-standing obstacle,we developed a reliable accessory for femtosecond ytterbium fiber chirped pulse amplifiers,termed a fiber-optic nonlinear wavelength converter(FNWC),as an adaptive optical source for the emergent field of femtosecond biophotonics.This accessory empowers the fixed-wavelength laser to produce fiber-delivered∼20 nJ pulses with central wavelength across 950 to 1150 nm,repetition rate across 1 to 10 MHz,and pulse width across 40 to 400 fs,with a long-term stability of>2000 h.As a prototypical label-free application in biology and medicine,we demonstrate the utility of FNWC in real-time intravital imaging synergistically integrated with modern machine learning and largescale fluorescence lifetime imaging microscopy.
基金supported by the National Key Research and Development Program of China(Grant No.2022YFE0118700)the National Natural Science Foundation of China(Grant No.62375101)the Fundamental Research Funds for the Central Universities(Grant No.YCJJ20230216).
文摘Spectroscopy,especially for plasma spectroscopy,provides a powerful platform for biological and material analysis with its elemental and molecular fingerprinting capability.Artificial intelligence(AI)has the tremendous potential to build a universal quantitative framework covering all branches of plasma spectroscopy based on its unmatched representation and generalization ability.Herein,we introduce an AI-based unified method called self-supervised image-spectrum twin information fusion detection(SISTIFD)to collect twin co-occurrence signals of the plasma and to intelligently predict the physical parameters for improving the performances of all plasma spectroscopic techniques.It can fuse the spectra and plasma images in synchronization,derive the plasma parameters(total number density,plasma temperature,electron density,and other implicit factors),and provide accurate results.The experimental data demonstrate their excellent utility and capacity,with a reduction of 98%in evaluation indices(root mean square error,relative standard deviation,etc.)and an analysis frequency of 143 Hz(much faster than the mainstream detection frame rate of 1 Hz).In addition,as a completely end-to-end and self-supervised framework,the SISTIFD enables automatic detection without manual preprocessing or intervention.With these advantages,it has remarkably enhanced various plasma spectroscopic techniques with state-of-the-art performance and unsealed their possibility in industry,especially in the regions that require both capability and efficiency.This scheme brings new inspiration to the whole field of plasma spectroscopy and enables in situ analysis with a real-world scenario of high throughput,cross-interference,various analyte complexity,and diverse applications.
文摘Massive Multiple-Input-Multiple-Output(MIMO)is a promising technology to meet the demand for the connection of massive devices and high data capacity for mobile networks in the next generation communication system.However,due to the massive connectivity of mobile devices,the pilot contamination problem will severely degrade the communication quality and spectrum efficiency of the massive MIMO system.We propose a deep Monte Carlo Tree Search(MCTS)-based intelligent Pilot-power Allocation Scheme(iPAS)to address this issue.The core of iPAS is a multi-task deep reinforcement learning algorithm that can automatically learn the radio environment and make decisions on the pilot sequence and power allocation to maximize the spectrum efficiency with self-play training.To accelerate the searching convergence,we introduce a Deep Neural Network(DNN)to predict the pilot sequence and power allocation actions.The DNN is trained in a self-supervised learning manner,where the training data is generated from the searching process of the MCTS algorithm.Numerical results show that our proposed iPAS achieves a better Cumulative Distribution Function(CDF)of the ergodic spectral efficiency compared with the previous suboptimal algorithms.
基金the National Natural Science Foundation of China(No.61872231)the National Key Research and Development Program of China(No.2021YFC2801000)the Major Research plan of the National Social Science Foundation of China(No.2000&ZD130).
文摘Speech emotion recognition,as an important component of humancomputer interaction technology,has received increasing attention.Recent studies have treated emotion recognition of speech signals as a multimodal task,due to its inclusion of the semantic features of two different modalities,i.e.,audio and text.However,existing methods often fail in effectively represent features and capture correlations.This paper presents a multi-level circulant cross-modal Transformer(MLCCT)formultimodal speech emotion recognition.The proposed model can be divided into three steps,feature extraction,interaction and fusion.Self-supervised embedding models are introduced for feature extraction,which give a more powerful representation of the original data than those using spectrograms or audio features such as Mel-frequency cepstral coefficients(MFCCs)and low-level descriptors(LLDs).In particular,MLCCT contains two types of feature interaction processes,where a bidirectional Long Short-term Memory(Bi-LSTM)with circulant interaction mechanism is proposed for low-level features,while a two-stream residual cross-modal Transformer block is appliedwhen high-level features are involved.Finally,we choose self-attention blocks for fusion and a fully connected layer to make predictions.To evaluate the performance of our proposed model,comprehensive experiments are conducted on three widely used benchmark datasets including IEMOCAP,MELD and CMU-MOSEI.The competitive results verify the effectiveness of our approach.
基金supported by the Innovation Program for Quantum Science and Technology (No.2021ZD0303200)the CAS Project for Young Scientists in Basic Research (No.YSBR-049)+1 种基金the National Natural Science Foundation of China (No.62225506)the Anhui Provincial Key Research and Development Plan (No.2022b13020006)。
文摘In scientific and industrial research, three-dimensional (3D) imaging, or depth measurement, is a critical tool that provides detailed insight into surface properties. Confocal microscopy, known for its precision in surface measurements, plays a key role in this field. However, 3D imaging based on confocal microscopy is often challenged by significant data requirements and slow measurement speeds. In this paper, we present a novel self-supervised learning algorithm called SSL Depth that overcomes these challenges. Specifically, our method exploits the feature learning capabilities of neural networks while avoiding the need for labeled data sets typically associated with supervised learning approaches. Through practical demonstrations on a commercially available confocal microscope, we find that our method not only maintains higher quality, but also significantly reduces the frequency of the z-axis sampling required for 3D imaging. This reduction results in a remarkable 16×measurement speed, with the potential for further acceleration in the future. Our methodological advance enables highly efficient and accurate 3D surface reconstructions, thereby expanding the potential applications of confocal microscopy in various scientific and industrial fields.
基金This paper is supported by the Fundamental Research Funds for the Central Universities[No.JUSRP12021].
文摘Starting from late 2019,the new coronavirus disease(COVID-19)has become a global crisis.With the development of online social media,people prefer to express their opinions and discuss the latest news online.We have witnessed the positive influence of online social media,which helped citizens and governments track the development of this pandemic in time.It is necessary to apply artificial intelligence(AI)techniques to online social media and automatically discover and track public opinions posted online.In this paper,we take Sina Weibo,the most widely used online social media in China,for analysis and experiments.We collect multi-modal microblogs about COVID-19 from 2020/1/1 to 2020/3/31 with a web crawler,including texts and images posted by users.In order to effectively discover what is being discussed about COVID-19 without human labeling,we propose a unified multi-modal framework,including an unsupervised short-text topic model to discover and track bursty topics,and a self-supervised model to learn image features so that we can retrieve related images about COVID-19.Experimental results have shown the effectiveness and superiority of the proposed models,and also have shown the considerable application prospects for analyzing and tracking public opinions about COVID-19.