As the field of autonomous driving evolves, real-time semantic segmentation has become a crucial part of computer vision tasks. However, most existing methods use lightweight convolution to reduce the computational ef...As the field of autonomous driving evolves, real-time semantic segmentation has become a crucial part of computer vision tasks. However, most existing methods use lightweight convolution to reduce the computational effort, resulting in lower accuracy. To address this problem, we construct TBANet, a network with an encoder-decoder structure for efficient feature extraction. In the encoder part, the TBA module is designed to extract details and the ETBA module is used to learn semantic representations in a high-dimensional space. In the decoder part, we design a combination of multiple upsampling methods to aggregate features with less computational overhead. We validate the efficiency of TBANet on the Cityscapes dataset. It achieves 75.1% mean Intersection over Union(mIoU) with only 2.07 million parameters and can reach 90.3 Frames Per Second(FPS).展开更多
Universal lesion detection(ULD)methods for computed tomography(CT)images play a vital role in the modern clinical medicine and intelligent automation.It is well known that single 2D CT slices lack spatial-temporal cha...Universal lesion detection(ULD)methods for computed tomography(CT)images play a vital role in the modern clinical medicine and intelligent automation.It is well known that single 2D CT slices lack spatial-temporal characteristics and contextual information compared to 3D CT blocks.However,3D CT blocks necessitate significantly higher hardware resources during the learning phase.Therefore,efficiently exploiting temporal correlation and spatial-temporal features of 2D CT slices is crucial for ULD tasks.In this paper,we propose a ULD network with the enhanced temporal correlation for this purpose,named TCE-Net.The designed TCE module is applied to enrich the discriminate feature representation of multiple sequential CT slices.Besides,we employ multi-scale feature maps to facilitate the localization and detection of lesions in various sizes.Extensive experiments are conducted on the DeepLesion benchmark demonstrate that thismethod achieves 66.84%and 78.18%for FS@0.5 and FS@1.0,respectively,outperforming compared state-of-the-art methods.展开更多
The infrastructure and construction of roads are crucial for the economic and social development of a region,but traffic-related challenges like accidents and congestion persist.Artificial Intelligence(AI)and Machine ...The infrastructure and construction of roads are crucial for the economic and social development of a region,but traffic-related challenges like accidents and congestion persist.Artificial Intelligence(AI)and Machine Learning(ML)have been used in road infrastructure and construction,particularly with the Internet of Things(IoT)devices.Object detection in Computer Vision also plays a key role in improving road infrastructure and addressing trafficrelated problems.This study aims to use You Only Look Once version 7(YOLOv7),Convolutional Block Attention Module(CBAM),the most optimized object-detection algorithm,to detect and identify traffic signs,and analyze effective combinations of adaptive optimizers like Adaptive Moment estimation(Adam),Root Mean Squared Propagation(RMSprop)and Stochastic Gradient Descent(SGD)with the YOLOv7.Using a portion of German traffic signs for training,the study investigates the feasibility of adopting smaller datasets while maintaining high accuracy.The model proposed in this study not only improves traffic safety by detecting traffic signs but also has the potential to contribute to the rapid development of autonomous vehicle systems.The study results showed an impressive accuracy of 99.7%when using a batch size of 8 and the Adam optimizer.This high level of accuracy demonstrates the effectiveness of the proposed model for the image classification task of traffic sign recognition.展开更多
In today’s world, there are many people suffering from mentalhealth problems such as depression and anxiety. If these conditions are notidentified and treated early, they can get worse quickly and have far-reachingne...In today’s world, there are many people suffering from mentalhealth problems such as depression and anxiety. If these conditions are notidentified and treated early, they can get worse quickly and have far-reachingnegative effects. Unfortunately, many people suffering from these conditions,especially depression and hypertension, are unaware of their existence until theconditions become chronic. Thus, this paper proposes a novel approach usingBi-directional Long Short-Term Memory (Bi-LSTM) algorithm and GlobalVector (GloVe) algorithm for the prediction and treatment of these conditions.Smartwatches and fitness bands can be equipped with these algorithms whichcan share data with a variety of IoT devices and smart systems to betterunderstand and analyze the user’s condition. We compared the accuracy andloss of the training dataset and the validation dataset of the two modelsnamely, Bi-LSTM without a global vector layer and with a global vector layer.It was observed that the model of Bi-LSTM without a global vector layer hadan accuracy of 83%,while Bi-LSTMwith a global vector layer had an accuracyof 86% with a precision of 86.4%, and an F1 score of 0.861. In addition toproviding basic therapies for the treatment of identified cases, our model alsohelps prevent the deterioration of associated conditions, making our methoda real-world solution.展开更多
Eye center localization is one of the most crucial and basic requirements for some human-computer interaction applications such as eye gaze estimation and eye tracking. There is a large body of works on this topic in ...Eye center localization is one of the most crucial and basic requirements for some human-computer interaction applications such as eye gaze estimation and eye tracking. There is a large body of works on this topic in recent years, but the accuracy still needs to be improved due to challenges in appearance such as the high variability of shapes, lighting conditions, viewing angles and possible occlusions. To address these problems and limitations, we propose a novel approach in this paper for the eye center localization with a fully convolutional network(FCN),which is an end-to-end and pixels-to-pixels network and can locate the eye center accurately. The key idea is to apply the FCN from the object semantic segmentation task to the eye center localization task since the problem of eye center localization can be regarded as a special semantic segmentation problem. We adapt contemporary FCN into a shallow structure with a large kernel convolutional block and transfer their performance from semantic segmentation to the eye center localization task by fine-tuning.Extensive experiments show that the proposed method outperforms the state-of-the-art methods in both accuracy and reliability of eye center localization. The proposed method has achieved a large performance improvement on the most challenging database and it thus provides a promising solution to some challenging applications.展开更多
Sheet metal forming technologies have been intensively studied for decades to meet the increasing demand for lightweight metal components.To surmount the springback occurring in sheet metal forming processes,numerous ...Sheet metal forming technologies have been intensively studied for decades to meet the increasing demand for lightweight metal components.To surmount the springback occurring in sheet metal forming processes,numerous studies have been performed to develop compensation methods.However,for most existing methods,the development cycle is still considerably time-consumptive and demands high computational or capital cost.In this paper,a novel theory-guided regularization method for training of deep neural networks(DNNs),implanted in a learning system,is introduced to learn the intrinsic relationship between the workpiece shape after springback and the required process parameter,e.g.,loading stroke,in sheet metal bending processes.By directly bridging the workpiece shape to the process parameter,issues concerning springback in the process design would be circumvented.The novel regularization method utilizes the well-recognized theories in material mechanics,Swift’s law,by penalizing divergence from this law throughout the network training process.The regularization is implemented by a multi-task learning network architecture,with the learning of extra tasks regularized during training.The stress-strain curve describing the material properties and the prior knowledge used to guide learning are stored in the database and the knowledge base,respectively.One can obtain the predicted loading stroke for a new workpiece shape by importing the target geometry through the user interface.In this research,the neural models were found to outperform a traditional machine learning model,support vector regression model,in experiments with different amount of training data.Through a series of studies with varying conditions of training data structure and amount,workpiece material and applied bending processes,the theory-guided DNN has been shown to achieve superior generalization and learning consistency than the data-driven DNNs,especially when only scarce and scattered experiment data are available for training which is often the case in practice.The theory-guided DNN could also be applicable to other sheet metal forming processes.It provides an alternative method for compensating springback with significantly shorter development cycle and less capital cost and computational requirement than traditional compensation methods in sheet metal forming industry.展开更多
Image captioning refers to automatic generation of descriptive texts according to the visual content of images.It is a technique integrating multiple disciplines including the computer vision(CV),natural language proc...Image captioning refers to automatic generation of descriptive texts according to the visual content of images.It is a technique integrating multiple disciplines including the computer vision(CV),natural language processing(NLP)and artificial intelligence.In recent years,substantial research efforts have been devoted to generate image caption with impressive progress.To summarize the recent advances in image captioning,we present a comprehensive review on image captioning,covering both traditional methods and recent deep learning-based techniques.Specifically,we first briefly review the early traditional works based on the retrieval and template.Then deep learning-based image captioning researches are focused,which is categorized into the encoder-decoder framework,attention mechanism and training strategies on the basis of model structures and training manners for a detailed introduction.After that,we summarize the publicly available datasets,evaluation metrics and those proposed for specific requirements,and then compare the state of the art methods on the MS COCO dataset.Finally,we provide some discussions on open challenges and future research directions.展开更多
We propose an approach for dependence tree structure learning via copula. A nonparametric algorithm for copula estimation is presented. Then a Chow-Liu like method based on dependence measure via copula is proposed to...We propose an approach for dependence tree structure learning via copula. A nonparametric algorithm for copula estimation is presented. Then a Chow-Liu like method based on dependence measure via copula is proposed to estimate maximum spanning bivariate copula associated with bivariate dependence relations. The main advantage of the approach is that learning with empirical copula focuses on dependence relations among random variables, without the need to know the properties of individual variables as well as without the requirement to specify parametric family of entire underlying distribution for individual variables. Experiments on two real-application data sets show the effectiveness of the proposed method.展开更多
Many organizations apply cloud computing to store and effectively process data for various applications.The user uploads the data in the cloud has less security due to the unreliable verification process of data integ...Many organizations apply cloud computing to store and effectively process data for various applications.The user uploads the data in the cloud has less security due to the unreliable verification process of data integrity.In this research,an enhanced Merkle hash tree method of effective authentication model is proposed in the multi-owner cloud to increase the security of the cloud data.Merkle Hash tree applies the leaf nodes with a hash tag and the non-leaf node contains the table of hash information of child to encrypt the large data.Merkle Hash tree provides the efficient mapping of data and easily identifies the changesmade in the data due to proper structure.The developed model supports privacy-preserving public auditing to provide a secure cloud storage system.The data owners upload the data in the cloud and edit the data using the private key.An enhanced Merkle hash tree method stores the data in the cloud server and splits it into batches.The data files requested by the data owner are audit by a third-party auditor and the multiowner authentication method is applied during the modification process to authenticate the user.The result shows that the proposed method reduces the encryption and decryption time for cloud data storage by 2–167 ms when compared to the existing Advanced Encryption Standard and Blowfish.展开更多
1 Introduction In this paper,we propose a novel domain-adaptive reconstruction method that effectively leverages deep learning and synthetic data to achieve robust 3D face reconstruction from a single depth image.The ...1 Introduction In this paper,we propose a novel domain-adaptive reconstruction method that effectively leverages deep learning and synthetic data to achieve robust 3D face reconstruction from a single depth image.The method applies two domain-adaptive neural networks for predicting head pose and facial shape,respectively.Both networks undergo training with a customized domain adaptation strategy,using a combination of auto-labeled synthetic and unlabeled real data.展开更多
文摘As the field of autonomous driving evolves, real-time semantic segmentation has become a crucial part of computer vision tasks. However, most existing methods use lightweight convolution to reduce the computational effort, resulting in lower accuracy. To address this problem, we construct TBANet, a network with an encoder-decoder structure for efficient feature extraction. In the encoder part, the TBA module is designed to extract details and the ETBA module is used to learn semantic representations in a high-dimensional space. In the decoder part, we design a combination of multiple upsampling methods to aggregate features with less computational overhead. We validate the efficiency of TBANet on the Cityscapes dataset. It achieves 75.1% mean Intersection over Union(mIoU) with only 2.07 million parameters and can reach 90.3 Frames Per Second(FPS).
基金Taishan Young Scholars Program of Shandong Province,Key Development Program for Basic Research of Shandong Province(ZR2020ZD44).
文摘Universal lesion detection(ULD)methods for computed tomography(CT)images play a vital role in the modern clinical medicine and intelligent automation.It is well known that single 2D CT slices lack spatial-temporal characteristics and contextual information compared to 3D CT blocks.However,3D CT blocks necessitate significantly higher hardware resources during the learning phase.Therefore,efficiently exploiting temporal correlation and spatial-temporal features of 2D CT slices is crucial for ULD tasks.In this paper,we propose a ULD network with the enhanced temporal correlation for this purpose,named TCE-Net.The designed TCE module is applied to enrich the discriminate feature representation of multiple sequential CT slices.Besides,we employ multi-scale feature maps to facilitate the localization and detection of lesions in various sizes.Extensive experiments are conducted on the DeepLesion benchmark demonstrate that thismethod achieves 66.84%and 78.18%for FS@0.5 and FS@1.0,respectively,outperforming compared state-of-the-art methods.
文摘The infrastructure and construction of roads are crucial for the economic and social development of a region,but traffic-related challenges like accidents and congestion persist.Artificial Intelligence(AI)and Machine Learning(ML)have been used in road infrastructure and construction,particularly with the Internet of Things(IoT)devices.Object detection in Computer Vision also plays a key role in improving road infrastructure and addressing trafficrelated problems.This study aims to use You Only Look Once version 7(YOLOv7),Convolutional Block Attention Module(CBAM),the most optimized object-detection algorithm,to detect and identify traffic signs,and analyze effective combinations of adaptive optimizers like Adaptive Moment estimation(Adam),Root Mean Squared Propagation(RMSprop)and Stochastic Gradient Descent(SGD)with the YOLOv7.Using a portion of German traffic signs for training,the study investigates the feasibility of adopting smaller datasets while maintaining high accuracy.The model proposed in this study not only improves traffic safety by detecting traffic signs but also has the potential to contribute to the rapid development of autonomous vehicle systems.The study results showed an impressive accuracy of 99.7%when using a batch size of 8 and the Adam optimizer.This high level of accuracy demonstrates the effectiveness of the proposed model for the image classification task of traffic sign recognition.
基金This research is funded by Vellore Institute of Technology,Chennai,India.
文摘In today’s world, there are many people suffering from mentalhealth problems such as depression and anxiety. If these conditions are notidentified and treated early, they can get worse quickly and have far-reachingnegative effects. Unfortunately, many people suffering from these conditions,especially depression and hypertension, are unaware of their existence until theconditions become chronic. Thus, this paper proposes a novel approach usingBi-directional Long Short-Term Memory (Bi-LSTM) algorithm and GlobalVector (GloVe) algorithm for the prediction and treatment of these conditions.Smartwatches and fitness bands can be equipped with these algorithms whichcan share data with a variety of IoT devices and smart systems to betterunderstand and analyze the user’s condition. We compared the accuracy andloss of the training dataset and the validation dataset of the two modelsnamely, Bi-LSTM without a global vector layer and with a global vector layer.It was observed that the model of Bi-LSTM without a global vector layer hadan accuracy of 83%,while Bi-LSTMwith a global vector layer had an accuracyof 86% with a precision of 86.4%, and an F1 score of 0.861. In addition toproviding basic therapies for the treatment of identified cases, our model alsohelps prevent the deterioration of associated conditions, making our methoda real-world solution.
基金supported by National Natural Science Foundation of China(61533019,U1811463)Open Fund of the State Key Laboratory for Management and Control of Complex Systems,Institute of Automation,Chinese Academy of Sciences(Y6S9011F51)in part by the EPSRC Project(EP/N025849/1)
文摘Eye center localization is one of the most crucial and basic requirements for some human-computer interaction applications such as eye gaze estimation and eye tracking. There is a large body of works on this topic in recent years, but the accuracy still needs to be improved due to challenges in appearance such as the high variability of shapes, lighting conditions, viewing angles and possible occlusions. To address these problems and limitations, we propose a novel approach in this paper for the eye center localization with a fully convolutional network(FCN),which is an end-to-end and pixels-to-pixels network and can locate the eye center accurately. The key idea is to apply the FCN from the object semantic segmentation task to the eye center localization task since the problem of eye center localization can be regarded as a special semantic segmentation problem. We adapt contemporary FCN into a shallow structure with a large kernel convolutional block and transfer their performance from semantic segmentation to the eye center localization task by fine-tuning.Extensive experiments show that the proposed method outperforms the state-of-the-art methods in both accuracy and reliability of eye center localization. The proposed method has achieved a large performance improvement on the most challenging database and it thus provides a promising solution to some challenging applications.
基金supported by Aviation Industry Corporation of China(AVIC)Manufacturing Technology Institute(MTI)and in part by China Scholarship Council(CSC)(201908060236)。
文摘Sheet metal forming technologies have been intensively studied for decades to meet the increasing demand for lightweight metal components.To surmount the springback occurring in sheet metal forming processes,numerous studies have been performed to develop compensation methods.However,for most existing methods,the development cycle is still considerably time-consumptive and demands high computational or capital cost.In this paper,a novel theory-guided regularization method for training of deep neural networks(DNNs),implanted in a learning system,is introduced to learn the intrinsic relationship between the workpiece shape after springback and the required process parameter,e.g.,loading stroke,in sheet metal bending processes.By directly bridging the workpiece shape to the process parameter,issues concerning springback in the process design would be circumvented.The novel regularization method utilizes the well-recognized theories in material mechanics,Swift’s law,by penalizing divergence from this law throughout the network training process.The regularization is implemented by a multi-task learning network architecture,with the learning of extra tasks regularized during training.The stress-strain curve describing the material properties and the prior knowledge used to guide learning are stored in the database and the knowledge base,respectively.One can obtain the predicted loading stroke for a new workpiece shape by importing the target geometry through the user interface.In this research,the neural models were found to outperform a traditional machine learning model,support vector regression model,in experiments with different amount of training data.Through a series of studies with varying conditions of training data structure and amount,workpiece material and applied bending processes,the theory-guided DNN has been shown to achieve superior generalization and learning consistency than the data-driven DNNs,especially when only scarce and scattered experiment data are available for training which is often the case in practice.The theory-guided DNN could also be applicable to other sheet metal forming processes.It provides an alternative method for compensating springback with significantly shorter development cycle and less capital cost and computational requirement than traditional compensation methods in sheet metal forming industry.
基金supported by Beijing Natural Science Foundation of China(L201023)the Natural Science Foundation of China(62076030)。
文摘Image captioning refers to automatic generation of descriptive texts according to the visual content of images.It is a technique integrating multiple disciplines including the computer vision(CV),natural language processing(NLP)and artificial intelligence.In recent years,substantial research efforts have been devoted to generate image caption with impressive progress.To summarize the recent advances in image captioning,we present a comprehensive review on image captioning,covering both traditional methods and recent deep learning-based techniques.Specifically,we first briefly review the early traditional works based on the retrieval and template.Then deep learning-based image captioning researches are focused,which is categorized into the encoder-decoder framework,attention mechanism and training strategies on the basis of model structures and training manners for a detailed introduction.After that,we summarize the publicly available datasets,evaluation metrics and those proposed for specific requirements,and then compare the state of the art methods on the MS COCO dataset.Finally,we provide some discussions on open challenges and future research directions.
文摘We propose an approach for dependence tree structure learning via copula. A nonparametric algorithm for copula estimation is presented. Then a Chow-Liu like method based on dependence measure via copula is proposed to estimate maximum spanning bivariate copula associated with bivariate dependence relations. The main advantage of the approach is that learning with empirical copula focuses on dependence relations among random variables, without the need to know the properties of individual variables as well as without the requirement to specify parametric family of entire underlying distribution for individual variables. Experiments on two real-application data sets show the effectiveness of the proposed method.
基金The Universiti Kebangsaan Malaysia(UKM)Research Grant Scheme FRGS/1/2020/ICT03/UKM/02/6 and GGPM-2020-028 funded this research.
文摘Many organizations apply cloud computing to store and effectively process data for various applications.The user uploads the data in the cloud has less security due to the unreliable verification process of data integrity.In this research,an enhanced Merkle hash tree method of effective authentication model is proposed in the multi-owner cloud to increase the security of the cloud data.Merkle Hash tree applies the leaf nodes with a hash tag and the non-leaf node contains the table of hash information of child to encrypt the large data.Merkle Hash tree provides the efficient mapping of data and easily identifies the changesmade in the data due to proper structure.The developed model supports privacy-preserving public auditing to provide a secure cloud storage system.The data owners upload the data in the cloud and edit the data using the private key.An enhanced Merkle hash tree method stores the data in the cloud server and splits it into batches.The data files requested by the data owner are audit by a third-party auditor and the multiowner authentication method is applied during the modification process to authenticate the user.The result shows that the proposed method reduces the encryption and decryption time for cloud data storage by 2–167 ms when compared to the existing Advanced Encryption Standard and Blowfish.
文摘1 Introduction In this paper,we propose a novel domain-adaptive reconstruction method that effectively leverages deep learning and synthetic data to achieve robust 3D face reconstruction from a single depth image.The method applies two domain-adaptive neural networks for predicting head pose and facial shape,respectively.Both networks undergo training with a customized domain adaptation strategy,using a combination of auto-labeled synthetic and unlabeled real data.