Recent research in cross-domain intelligence fault diagnosis of machinery still has some problems,such as relatively ideal speed conditions and sample conditions.In engineering practice,the rotational speed of the mac...Recent research in cross-domain intelligence fault diagnosis of machinery still has some problems,such as relatively ideal speed conditions and sample conditions.In engineering practice,the rotational speed of the machine is often transient and time-varying,which makes the sample annotation increasingly expensive.Meanwhile,the number of samples collected from different health states is often unbalanced.To deal with the above challenges,a complementary-label(CL)adversarial domain adaptation fault diagnosis network(CLADAN)is proposed under time-varying rotational speed and weakly-supervised conditions.In the weakly supervised learning condition,machine prior information is used for sample annotation via cost-friendly complementary label learning.A diagnosticmodel learning strategywith discretized category probabilities is designed to avoidmulti-peak distribution of prediction results.In adversarial training process,we developed virtual adversarial regularization(VAR)strategy,which further enhances the robustness of the model by adding adversarial perturbations in the target domain.Comparative experiments on two case studies validated the superior performance of the proposed method.展开更多
In a crowd density estimation dataset,the annotation of crowd locations is an extremely laborious task,and they are not taken into the evaluation metrics.In this paper,we aim to reduce the annotation cost of crowd dat...In a crowd density estimation dataset,the annotation of crowd locations is an extremely laborious task,and they are not taken into the evaluation metrics.In this paper,we aim to reduce the annotation cost of crowd datasets,and propose a crowd density estimation method based on weakly-supervised learning,in the absence of crowd position supervision information,which directly reduces the number of crowds by using the number of pedestrians in the image as the supervised information.For this purpose,we design a new training method,which exploits the correlation between global and local image features by incremental learning to train the network.Specifically,we design a parent-child network(PC-Net)focusing on the global and local image respectively,and propose a linear feature calibration structure to train the PC-Net simultaneously,and the child network learns feature transfer factors and feature bias weights,and uses the transfer factors and bias weights to linearly feature calibrate the features extracted from the Parent network,to improve the convergence of the network by using local features hidden in the crowd images.In addition,we use the pyramid vision transformer as the backbone of the PC-Net to extract crowd features at different levels,and design a global-local feature loss function(L2).We combine it with a crowd counting loss(LC)to enhance the sensitivity of the network to crowd features during the training process,which effectively improves the accuracy of crowd density estimation.The experimental results show that the PC-Net significantly reduces the gap between fullysupervised and weakly-supervised crowd density estimation,and outperforms the comparison methods on five datasets of Shanghai Tech Part A,ShanghaiTech Part B,UCF_CC_50,UCF_QNRF and JHU-CROWD++.展开更多
Instance co-segmentation aims to segment the co-occurrent instances among two images.This task heavily relies on instance-related cues provided by co-peaks,which are generally estimated by exhaustively exploiting all ...Instance co-segmentation aims to segment the co-occurrent instances among two images.This task heavily relies on instance-related cues provided by co-peaks,which are generally estimated by exhaustively exploiting all paired candidates in point-to-point patterns.However,such patterns could yield a high number of false-positive co-peaks,resulting in over-segmentation whenever there are mutual occlusions.To tackle with this issue,this paper proposes an instance co-segmentation method via tensor-based salient co-peak search(TSCPS-ICS).The proposed method explores high-order correlations via triple-to-triple matching among feature maps to find reliable co-peaks with the help of co-saliency detection.The proposed method is shown to capture more accurate intra-peaks and inter-peaks among feature maps,reducing the false-positive rate of co-peak search.Upon having accurate co-peaks,one can efficiently infer responses of the targeted instance.Experiments on four benchmark datasets validate the superior performance of the proposed method.展开更多
Osteoporosis is a systemic disease characterized by low bone mass,impaired bone microstruc-ture,increased bone fragility,and a higher risk of fractures.It commonly affects postmenopausal women and the elderly.Orthopan...Osteoporosis is a systemic disease characterized by low bone mass,impaired bone microstruc-ture,increased bone fragility,and a higher risk of fractures.It commonly affects postmenopausal women and the elderly.Orthopantomography,also known as panoramic radiography,is a widely used imaging technique in dental examinations due to its low cost and easy accessibility.Previous studies have shown that the mandibular cortical index(MCI)derived from orthopantomography can serve as an important indicator of osteoporosis risk.To address this,this study proposes a parallel Transformer network based on multiple instance learning.By introducing parallel modules that alleviate optimization issues and integrating multiple-instance learning with the Transformer architecture,our model effectively extracts information from image patches.Our model achieves an accuracy of 86%and an AUC score of 0.963 on an osteoporosis dataset,which demonstrates its promising and competitive performance.展开更多
Transformers have dominated the field of natural language processing and have recently made an impact in the area of computer vision.In the field of medical image analysis,transformers have also been successfully used...Transformers have dominated the field of natural language processing and have recently made an impact in the area of computer vision.In the field of medical image analysis,transformers have also been successfully used in to full-stack clinical applications,including image synthesis/reconstruction,registration,segmentation,detection,and diagnosis.This paper aimed to promote awareness of the applications of transformers in medical image analysis.Specifically,we first provided an overview of the core concepts of the attention mechanism built into transformers and other basic components.Second,we reviewed various transformer architectures tailored for medical image applications and discuss their limitations.Within this review,we investigated key challenges including the use of transformers in different learning paradigms,improving model efficiency,and coupling with other techniques.We hope this review would provide a comprehensive picture of transformers to readers with an interest in medical image analysis.展开更多
Concept learning constructs visual representations that are connected to linguistic semantics, which is fundamental to vision-language tasks. Although promising progress has been made, existing concept learners are st...Concept learning constructs visual representations that are connected to linguistic semantics, which is fundamental to vision-language tasks. Although promising progress has been made, existing concept learners are still vulnerable to attribute perturbations and out-of-distribution compositions during inference. We ascribe the bottleneck to a failure to explore the intrinsic semantic hierarchy of visual concepts, e.g., {red, blue,···} ∈“color” subspace yet cube ∈“shape”. In this paper, we propose a visual superordinate abstraction framework for explicitly modeling semantic-aware visual subspaces(i.e., visual superordinates). With only natural visual question answering data, our model first acquires the semantic hierarchy from a linguistic view and then explores mutually exclusive visual superordinates under the guidance of linguistic hierarchy. In addition, a quasi-center visual concept clustering and superordinate shortcut learning schemes are proposed to enhance the discrimination and independence of concepts within each visual superordinate. Experiments demonstrate the superiority of the proposed framework under diverse settings, which increases the overall answering accuracy relatively by 7.5% for reasoning with perturbations and 15.6% for compositional generalization tests.展开更多
Learning an effective object detector with little supervision is an essential but challenging problem in computer vision applications. In this paper, we consider the problem of learning a deep convolutional neural net...Learning an effective object detector with little supervision is an essential but challenging problem in computer vision applications. In this paper, we consider the problem of learning a deep convolutional neural network (CNN) based object detector using weakly-supervised and semi-supervised information in the framework of fast region-based CNN (Fast R-CNN). The target is to obtain an object detector as accurate as the fully-supervised Fast R-CNN, but it requires less image annotation effort. To solve this problem, we use weakly-supervised training images (i.e., only the image-level annotation is given) and a few proportions of fully-supervised training images (i.e., the bounding box level annotation is given), that is a weakly-and semi-supervised (WASS) object detection setting. The proposed solution is termed as WASS R-CNN, in which there are two main components. At first, a weakly-supervised R-CNN is firstly trained;after that semi-supervised data are used for finetuning the weakly-supervised detector. We perform object detection experiments on the PASCAL VOC 2007 dataset. The proposed WASS R-CNN achieves more than 85% of a fully-supervised Fast R-CNN's performance (measured using mean average precision) with only 10%of fully-supervised annotations together with weak supervision for all training images. The results show that the proposed learning framework can significantly reduce the labeling efforts for obtaining reliable object detectors.展开更多
Image classification is vital and basic in many data analysis domains.Since real-world images generally contain multiple diverse semantic labels,it amounts to a typical multi-label classification problem.Traditional m...Image classification is vital and basic in many data analysis domains.Since real-world images generally contain multiple diverse semantic labels,it amounts to a typical multi-label classification problem.Traditional multi-label image classification relies on a large amount of training data with plenty of labels,which requires a lot of human and financial costs.By contrast,one can easily obtain a correlation matrix of concerned categories in current scene based on the historical image data in other application scenarios.How to perform image classification with only label correlation priors,without specific and costly annotated labels,is an important but rarely studied problem.In this paper,we propose a model to classify images with this kind of weak correlation prior.We use label correlation to recapitulate the sample similarity,employ the prior information to decompose the projection matrix when regressing the label indication matrix,and introduce the L_(2,1) norm to select features for each image.Finally,experimental results on several image datasets demonstrate that the proposed model has distinct advantages over current state-of-the-art multi-label classification methods.展开更多
基金Shanxi Scholarship Council of China(2022-141)Fundamental Research Program of Shanxi Province(202203021211096).
文摘Recent research in cross-domain intelligence fault diagnosis of machinery still has some problems,such as relatively ideal speed conditions and sample conditions.In engineering practice,the rotational speed of the machine is often transient and time-varying,which makes the sample annotation increasingly expensive.Meanwhile,the number of samples collected from different health states is often unbalanced.To deal with the above challenges,a complementary-label(CL)adversarial domain adaptation fault diagnosis network(CLADAN)is proposed under time-varying rotational speed and weakly-supervised conditions.In the weakly supervised learning condition,machine prior information is used for sample annotation via cost-friendly complementary label learning.A diagnosticmodel learning strategywith discretized category probabilities is designed to avoidmulti-peak distribution of prediction results.In adversarial training process,we developed virtual adversarial regularization(VAR)strategy,which further enhances the robustness of the model by adding adversarial perturbations in the target domain.Comparative experiments on two case studies validated the superior performance of the proposed method.
基金the Humanities and Social Science Fund of the Ministry of Education of China(21YJAZH077)。
文摘In a crowd density estimation dataset,the annotation of crowd locations is an extremely laborious task,and they are not taken into the evaluation metrics.In this paper,we aim to reduce the annotation cost of crowd datasets,and propose a crowd density estimation method based on weakly-supervised learning,in the absence of crowd position supervision information,which directly reduces the number of crowds by using the number of pedestrians in the image as the supervised information.For this purpose,we design a new training method,which exploits the correlation between global and local image features by incremental learning to train the network.Specifically,we design a parent-child network(PC-Net)focusing on the global and local image respectively,and propose a linear feature calibration structure to train the PC-Net simultaneously,and the child network learns feature transfer factors and feature bias weights,and uses the transfer factors and bias weights to linearly feature calibrate the features extracted from the Parent network,to improve the convergence of the network by using local features hidden in the crowd images.In addition,we use the pyramid vision transformer as the backbone of the PC-Net to extract crowd features at different levels,and design a global-local feature loss function(L2).We combine it with a crowd counting loss(LC)to enhance the sensitivity of the network to crowd features during the training process,which effectively improves the accuracy of crowd density estimation.The experimental results show that the PC-Net significantly reduces the gap between fullysupervised and weakly-supervised crowd density estimation,and outperforms the comparison methods on five datasets of Shanghai Tech Part A,ShanghaiTech Part B,UCF_CC_50,UCF_QNRF and JHU-CROWD++.
基金supported in part by the National Natural Science Foundation of China (Grant Nos.U21A20520,62172112)the Key-Area Research and Development of Guangdong Province (2022A0505050014,2020B1111190001)+1 种基金the National Key Research and Development Program of China (2022YFE0112200)the Key-Area Research and Development Program of Guangzhou City (202206030009).
文摘Instance co-segmentation aims to segment the co-occurrent instances among two images.This task heavily relies on instance-related cues provided by co-peaks,which are generally estimated by exhaustively exploiting all paired candidates in point-to-point patterns.However,such patterns could yield a high number of false-positive co-peaks,resulting in over-segmentation whenever there are mutual occlusions.To tackle with this issue,this paper proposes an instance co-segmentation method via tensor-based salient co-peak search(TSCPS-ICS).The proposed method explores high-order correlations via triple-to-triple matching among feature maps to find reliable co-peaks with the help of co-saliency detection.The proposed method is shown to capture more accurate intra-peaks and inter-peaks among feature maps,reducing the false-positive rate of co-peak search.Upon having accurate co-peaks,one can efficiently infer responses of the targeted instance.Experiments on four benchmark datasets validate the superior performance of the proposed method.
文摘Osteoporosis is a systemic disease characterized by low bone mass,impaired bone microstruc-ture,increased bone fragility,and a higher risk of fractures.It commonly affects postmenopausal women and the elderly.Orthopantomography,also known as panoramic radiography,is a widely used imaging technique in dental examinations due to its low cost and easy accessibility.Previous studies have shown that the mandibular cortical index(MCI)derived from orthopantomography can serve as an important indicator of osteoporosis risk.To address this,this study proposes a parallel Transformer network based on multiple instance learning.By introducing parallel modules that alleviate optimization issues and integrating multiple-instance learning with the Transformer architecture,our model effectively extracts information from image patches.Our model achieves an accuracy of 86%and an AUC score of 0.963 on an osteoporosis dataset,which demonstrates its promising and competitive performance.
基金the National Natural Science Foundation of China(Grant No.62106101)the Natural Science Foundation of Jiangsu Province(Grant No.BK20210180).
文摘Transformers have dominated the field of natural language processing and have recently made an impact in the area of computer vision.In the field of medical image analysis,transformers have also been successfully used in to full-stack clinical applications,including image synthesis/reconstruction,registration,segmentation,detection,and diagnosis.This paper aimed to promote awareness of the applications of transformers in medical image analysis.Specifically,we first provided an overview of the core concepts of the attention mechanism built into transformers and other basic components.Second,we reviewed various transformer architectures tailored for medical image applications and discuss their limitations.Within this review,we investigated key challenges including the use of transformers in different learning paradigms,improving model efficiency,and coupling with other techniques.We hope this review would provide a comprehensive picture of transformers to readers with an interest in medical image analysis.
基金supported in part by the Australian Research Council(ARC)(Nos.FL-170100117,DP-180103424,IC-190100031 and LE-200100049).
文摘Concept learning constructs visual representations that are connected to linguistic semantics, which is fundamental to vision-language tasks. Although promising progress has been made, existing concept learners are still vulnerable to attribute perturbations and out-of-distribution compositions during inference. We ascribe the bottleneck to a failure to explore the intrinsic semantic hierarchy of visual concepts, e.g., {red, blue,···} ∈“color” subspace yet cube ∈“shape”. In this paper, we propose a visual superordinate abstraction framework for explicitly modeling semantic-aware visual subspaces(i.e., visual superordinates). With only natural visual question answering data, our model first acquires the semantic hierarchy from a linguistic view and then explores mutually exclusive visual superordinates under the guidance of linguistic hierarchy. In addition, a quasi-center visual concept clustering and superordinate shortcut learning schemes are proposed to enhance the discrimination and independence of concepts within each visual superordinate. Experiments demonstrate the superiority of the proposed framework under diverse settings, which increases the overall answering accuracy relatively by 7.5% for reasoning with perturbations and 15.6% for compositional generalization tests.
基金This work was supported by the National Natural Science Foundation of China under Grant Nos.61876212,61733007,and 61572207the National Key Research and Development Program of China under Grant No.2018YFB1402604.
文摘Learning an effective object detector with little supervision is an essential but challenging problem in computer vision applications. In this paper, we consider the problem of learning a deep convolutional neural network (CNN) based object detector using weakly-supervised and semi-supervised information in the framework of fast region-based CNN (Fast R-CNN). The target is to obtain an object detector as accurate as the fully-supervised Fast R-CNN, but it requires less image annotation effort. To solve this problem, we use weakly-supervised training images (i.e., only the image-level annotation is given) and a few proportions of fully-supervised training images (i.e., the bounding box level annotation is given), that is a weakly-and semi-supervised (WASS) object detection setting. The proposed solution is termed as WASS R-CNN, in which there are two main components. At first, a weakly-supervised R-CNN is firstly trained;after that semi-supervised data are used for finetuning the weakly-supervised detector. We perform object detection experiments on the PASCAL VOC 2007 dataset. The proposed WASS R-CNN achieves more than 85% of a fully-supervised Fast R-CNN's performance (measured using mean average precision) with only 10%of fully-supervised annotations together with weak supervision for all training images. The results show that the proposed learning framework can significantly reduce the labeling efforts for obtaining reliable object detectors.
基金supported by the National Natural Science Foundation of China(Nos.61922087,61906201,62006238,and 62136005)the Natural Science Fund for Distinguished Young Scholars of Hunan Province(No.2019JJ20020).
文摘Image classification is vital and basic in many data analysis domains.Since real-world images generally contain multiple diverse semantic labels,it amounts to a typical multi-label classification problem.Traditional multi-label image classification relies on a large amount of training data with plenty of labels,which requires a lot of human and financial costs.By contrast,one can easily obtain a correlation matrix of concerned categories in current scene based on the historical image data in other application scenarios.How to perform image classification with only label correlation priors,without specific and costly annotated labels,is an important but rarely studied problem.In this paper,we propose a model to classify images with this kind of weak correlation prior.We use label correlation to recapitulate the sample similarity,employ the prior information to decompose the projection matrix when regressing the label indication matrix,and introduce the L_(2,1) norm to select features for each image.Finally,experimental results on several image datasets demonstrate that the proposed model has distinct advantages over current state-of-the-art multi-label classification methods.