Noise reduction analysis of signals is essential for modern underwater acoustic detection systems.The traditional noise reduction techniques gradually lose efficacy because the target signal is masked by biological an...Noise reduction analysis of signals is essential for modern underwater acoustic detection systems.The traditional noise reduction techniques gradually lose efficacy because the target signal is masked by biological and natural noise in the marine environ-ment.The feature extraction method combining time-frequency spectrograms and deep learning can effectively achieve the separation of noise and target signals.A fully convolutional encoder-decoder neural network(FCEDN)is proposed to address the issue of noise reduc-tion in underwater acoustic signals.The time-domain waveform map of underwater acoustic signals is converted into a wavelet low-frequency analysis recording spectrogram during the denoising process to preserve as many underwater acoustic signal characteristics as possible.The FCEDN is built to learn the spectrogram mapping between noise and target signals that can be learned at each time level.The transposed convolution transforms are introduced,which can transform the spectrogram features of the signals into listenable audio files.After evaluating the systems on the ShipsEar Dataset,the proposed method can increase SNR and SI-SNR by 10.02 and 9.5dB,re-spectively.展开更多
According to the characteristics of the road features,an Encoder-Decoder deep semantic segmentation network is designed for the road extraction of remote sensing images.Firstly,as the features of the road target are r...According to the characteristics of the road features,an Encoder-Decoder deep semantic segmentation network is designed for the road extraction of remote sensing images.Firstly,as the features of the road target are rich in local details and simple in semantic features,an Encoder-Decoder network with shallow layers and high resolution is designed to improve the ability to represent detail information.Secondly,as the road area is a small proportion in remote sensing images,the cross-entropy loss function is improved,which solves the imbalance between positive and negative samples in the training process.Experiments on large road extraction datasets show that the proposed method gets the recall rate 83.9%,precision 82.5%and F1-score 82.9%,which can extract the road targets in remote sensing images completely and accurately.The Encoder-Decoder network designed in this paper performs well in the road extraction task and needs less artificial participation,so it has a good application prospect.展开更多
The development of multimedia content has resulted in a massiveincrease in network traffic for video streaming. It demands such types ofsolutions that can be addressed to obtain the user’s Quality-of-Experience(QoE)....The development of multimedia content has resulted in a massiveincrease in network traffic for video streaming. It demands such types ofsolutions that can be addressed to obtain the user’s Quality-of-Experience(QoE). 360-degree videos have already taken up the user’s behavior by storm.However, the users only focus on the part of 360-degree videos, known as aviewport. Despite the immense hype, 360-degree videos convey a loathsomeside effect about viewport prediction, making viewers feel uncomfortablebecause user viewport needs to be pre-fetched in advance. Ideally, we canminimize the bandwidth consumption if we know what the user motionin advance. Looking into the problem definition, we propose an EncoderDecoder based Long-Short Term Memory (LSTM) model to more accuratelycapture the non-linear relationship between past and future viewport positions. This model takes the transforming data instead of taking the direct inputto predict the future user movement. Then, this prediction model is combinedwith a rate adaptation approach that assigns the bitrates to various tiles for360-degree video frames under a given network capacity. Hence, our proposedwork aims to facilitate improved system performance when QoE parametersare jointly optimized. Some experiments were carried out and compared withexisting work to prove the performance of the proposed model. Last but notleast, the experiments implementation of our proposed work provides highuser’s QoE than its competitors.展开更多
As a common and high-risk type of disease,heart disease seriously threatens people’s health.At the same time,in the era of the Internet of Thing(IoT),smart medical device has strong practical significance for medical...As a common and high-risk type of disease,heart disease seriously threatens people’s health.At the same time,in the era of the Internet of Thing(IoT),smart medical device has strong practical significance for medical workers and patients because of its ability to assist in the diagnosis of diseases.Therefore,the research of real-time diagnosis and classification algorithms for arrhythmia can help to improve the diagnostic efficiency of diseases.In this paper,we design an automatic arrhythmia classification algorithm model based on Convolutional Neural Network(CNN)and Encoder-Decoder model.The model uses Long Short-Term Memory(LSTM)to consider the influence of time series features on classification results.Simultaneously,it is trained and tested by the MIT-BIH arrhythmia database.Besides,Generative Adversarial Networks(GAN)is adopted as a method of data equalization for solving data imbalance problem.The simulation results show that for the inter-patient arrhythmia classification,the hybrid model combining CNN and Encoder-Decoder model has the best classification accuracy,of which the accuracy can reach 94.05%.Especially,it has a better advantage for the classification effect of supraventricular ectopic beats(class S)and fusion beats(class F).展开更多
Cultivated land extraction is essential for sustainable development and agriculture.In this paper,the network we propose is based on the encoder-decoder structure,which extracts the semantic segmentation neural networ...Cultivated land extraction is essential for sustainable development and agriculture.In this paper,the network we propose is based on the encoder-decoder structure,which extracts the semantic segmentation neural network of cultivated land from satellite images and uses it for agricultural automation solutions.The encoder consists of two part:the first is the modified Xception,it can used as the feature extraction network,and the second is the atrous convolution,it can used to expand the receptive field and the context information to extract richer feature information.The decoder part uses the conventional upsampling operation to restore the original resolution.In addition,we use the combination of BCE and Loves-hinge as a loss function to optimize the Intersection over Union(IoU).Experimental results show that the proposed network structure can solve the problem of cultivated land extraction in Yinchuan City.展开更多
Medical image segmentation has witnessed rapid advancements with the emergence of encoder-decoder based methods.In the encoder-decoder structure,the primary goal of the decoding phase is not only to restore feature ma...Medical image segmentation has witnessed rapid advancements with the emergence of encoder-decoder based methods.In the encoder-decoder structure,the primary goal of the decoding phase is not only to restore feature map resolution,but also to mitigate the loss of feature information incurred during the encoding phase.However,this approach gives rise to a challenge:multiple up-sampling operations in the decoder segment result in the loss of feature information.To address this challenge,we propose a novel network that removes the decoding structure to reduce feature information loss(CBL-Net).In particular,we introduce a Parallel Pooling Module(PPM)to counteract the feature information loss stemming from conventional and pooling operations during the encoding stage.Furthermore,we incorporate a Multiplexed Dilation Convolution(MDC)module to expand the network's receptive field.Also,although we have removed the decoding stage,we still need to recover the feature map resolution.Therefore,we introduced the Global Feature Recovery(GFR)module.It uses attention mechanism for the image feature map resolution recovery,which can effectively reduce the loss of feature information.We conduct extensive experimental evaluations on three publicly available medical image segmentation datasets:DRIVE,CHASEDB and MoNuSeg datasets.Experimental results show that our proposed network outperforms state-of-the-art methods in medical image segmentation.In addition,it achieves higher efficiency than the current network of coding and decoding structures by eliminating the decoding component.展开更多
The existing knowledge regarding the interfacial forces,lubrication,and wear of bearings in real-world operation has significantly improved their designs over time,allowing for prolonged service life.As a result,self-...The existing knowledge regarding the interfacial forces,lubrication,and wear of bearings in real-world operation has significantly improved their designs over time,allowing for prolonged service life.As a result,self-lubricating bearings have become a viable alternative to traditional bearing designs in industrial machines.However,wear mechanisms are still inevitable and occur progressively in self-lubricating bearings,as characterized by the loss of the lubrication film and seizure.Therefore,monitoring the stages of the wear states in these components will help to impart the necessary countermeasures to reduce the machine maintenance downtime.This article proposes a methodology for using a long short-term memory(LSTM)-based encoder-decoder architecture on interfacial force signatures to detect abnormal regimes,aiming to provide early predictions of failure in self-lubricating sliding contacts even before they occur.Reciprocating sliding experiments were performed using a self-lubricating bronze bushing and steel shaft journal in a custom-built transversally oscillating tribometer setup.The force signatures corresponding to each cycle of the reciprocating sliding motion in the normal regime were used as inputs to train the encoder-decoder architecture,so as to reconstruct any new signal of the normal regime with the minimum error.With this semi-supervised training exercise,the force signatures corresponding to the abnormal regime could be differentiated from the normal regime,as their reconstruction errors would be very high.During the validation procedure for the proposed LSTM-based encoder-decoder model,the model predicted the force signals corresponding to the normal and abnormal regimes with an accuracy of 97%.In addition,a visualization of the reconstruction error across the entire force signature showed noticeable patterns in the reconstruction error when temporally decoded before the actual critical failure point,making it possible to be used for early predictions of failure.展开更多
Accurate pedestrian trajectory predictions are critical in self-driving systems,as they are fundamental to the response-and decision-making of ego vehicles.In this study,we focus on the problem of predicting the futur...Accurate pedestrian trajectory predictions are critical in self-driving systems,as they are fundamental to the response-and decision-making of ego vehicles.In this study,we focus on the problem of predicting the future trajectory of pedestrians from a first-person perspective.Most existing trajectory prediction methods from the first-person view copy the bird’s-eye view,neglecting the differences between the two.To this end,we clarify the differences between the two views and highlight the importance of action-aware trajectory prediction in the first-person view.We propose a new action-aware network based on an encoder-decoder framework with an action prediction and a goal estimation branch at the end of the encoder.In the decoder part,bidirectional long short-term memory(Bi-LSTM)blocks are adopted to generate the ultimate prediction of pedestrians’future trajectories.Our method was evaluated on a public dataset and achieved a competitive performance,compared with other approaches.An ablation study demonstrates the effectiveness of the action prediction branch.展开更多
Anomaly detection in smart grid is critical to enhance the reliability of power systems. Excessive manpower has to be involved in analyzing the measurement data collected from intelligent motoring devices while perfor...Anomaly detection in smart grid is critical to enhance the reliability of power systems. Excessive manpower has to be involved in analyzing the measurement data collected from intelligent motoring devices while performance of anomaly detection is still not satisfactory. This is mainly because the inherent spatio-temporality and multi-dimensionality of the measurement data cannot be easily captured. In this paper, we propose an anomaly detection model based on encoder-decoder framework with recurrent neural network (RNN). In the model, an input time series is reconstructed and an anomaly can be detected by an unexpected high reconstruction error. Both Manhattan distance and the edit distance are used to evaluate the difference between an input time series and its reconstructed one. Finally, we validate the proposed model by using power demand data from University of California, Riverside (UCR) time series classification archive and IEEE 39 bus system simulation data. Results from the analysis demonstrate that the proposed encoder-decoder framework is able to successfully capture anomalies with a precision higher than 95%.展开更多
In the video captioning methods based on an encoder-decoder,limited visual features are extracted by an encoder,and a natural sentence of the video content is generated using a decoder.However,this kind ofmethod is de...In the video captioning methods based on an encoder-decoder,limited visual features are extracted by an encoder,and a natural sentence of the video content is generated using a decoder.However,this kind ofmethod is dependent on a single video input source and few visual labels,and there is a problem with semantic alignment between video contents and generated natural sentences,which are not suitable for accurately comprehending and describing the video contents.To address this issue,this paper proposes a video captioning method by semantic topic-guided generation.First,a 3D convolutional neural network is utilized to extract the spatiotemporal features of videos during the encoding.Then,the semantic topics of video data are extracted using the visual labels retrieved from similar video data.In the decoding,a decoder is constructed by combining a novel Enhance-TopK sampling algorithm with a Generative Pre-trained Transformer-2 deep neural network,which decreases the influence of“deviation”in the semantic mapping process between videos and texts by jointly decoding a baseline and semantic topics of video contents.During this process,the designed Enhance-TopK sampling algorithm can alleviate a long-tail problem by dynamically adjusting the probability distribution of the predicted words.Finally,the experiments are conducted on two publicly used Microsoft Research Video Description andMicrosoft Research-Video to Text datasets.The experimental results demonstrate that the proposed method outperforms several state-of-art approaches.Specifically,the performance indicators Bilingual Evaluation Understudy,Metric for Evaluation of Translation with Explicit Ordering,Recall Oriented Understudy for Gisting Evaluation-longest common subsequence,and Consensus-based Image Description Evaluation of the proposed method are improved by 1.2%,0.1%,0.3%,and 2.4% on the Microsoft Research Video Description dataset,and 0.1%,1.0%,0.1%,and 2.8% on the Microsoft Research-Video to Text dataset,respectively,compared with the existing video captioning methods.As a result,the proposed method can generate video captioning that is more closely aligned with human natural language expression habits.展开更多
As the field of autonomous driving evolves, real-time semantic segmentation has become a crucial part of computer vision tasks. However, most existing methods use lightweight convolution to reduce the computational ef...As the field of autonomous driving evolves, real-time semantic segmentation has become a crucial part of computer vision tasks. However, most existing methods use lightweight convolution to reduce the computational effort, resulting in lower accuracy. To address this problem, we construct TBANet, a network with an encoder-decoder structure for efficient feature extraction. In the encoder part, the TBA module is designed to extract details and the ETBA module is used to learn semantic representations in a high-dimensional space. In the decoder part, we design a combination of multiple upsampling methods to aggregate features with less computational overhead. We validate the efficiency of TBANet on the Cityscapes dataset. It achieves 75.1% mean Intersection over Union(mIoU) with only 2.07 million parameters and can reach 90.3 Frames Per Second(FPS).展开更多
基金supported by the National Natural Science Foundation of China(No.41906169)the PLA Academy of Military Sciences.
文摘Noise reduction analysis of signals is essential for modern underwater acoustic detection systems.The traditional noise reduction techniques gradually lose efficacy because the target signal is masked by biological and natural noise in the marine environ-ment.The feature extraction method combining time-frequency spectrograms and deep learning can effectively achieve the separation of noise and target signals.A fully convolutional encoder-decoder neural network(FCEDN)is proposed to address the issue of noise reduc-tion in underwater acoustic signals.The time-domain waveform map of underwater acoustic signals is converted into a wavelet low-frequency analysis recording spectrogram during the denoising process to preserve as many underwater acoustic signal characteristics as possible.The FCEDN is built to learn the spectrogram mapping between noise and target signals that can be learned at each time level.The transposed convolution transforms are introduced,which can transform the spectrogram features of the signals into listenable audio files.After evaluating the systems on the ShipsEar Dataset,the proposed method can increase SNR and SI-SNR by 10.02 and 9.5dB,re-spectively.
基金National Natural Science Foundation of China(Nos.61673017,61403398)and Natural Science Foundation of Shaanxi Province(Nos.2017JM6077,2018ZDXM-GY-039)。
文摘According to the characteristics of the road features,an Encoder-Decoder deep semantic segmentation network is designed for the road extraction of remote sensing images.Firstly,as the features of the road target are rich in local details and simple in semantic features,an Encoder-Decoder network with shallow layers and high resolution is designed to improve the ability to represent detail information.Secondly,as the road area is a small proportion in remote sensing images,the cross-entropy loss function is improved,which solves the imbalance between positive and negative samples in the training process.Experiments on large road extraction datasets show that the proposed method gets the recall rate 83.9%,precision 82.5%and F1-score 82.9%,which can extract the road targets in remote sensing images completely and accurately.The Encoder-Decoder network designed in this paper performs well in the road extraction task and needs less artificial participation,so it has a good application prospect.
文摘The development of multimedia content has resulted in a massiveincrease in network traffic for video streaming. It demands such types ofsolutions that can be addressed to obtain the user’s Quality-of-Experience(QoE). 360-degree videos have already taken up the user’s behavior by storm.However, the users only focus on the part of 360-degree videos, known as aviewport. Despite the immense hype, 360-degree videos convey a loathsomeside effect about viewport prediction, making viewers feel uncomfortablebecause user viewport needs to be pre-fetched in advance. Ideally, we canminimize the bandwidth consumption if we know what the user motionin advance. Looking into the problem definition, we propose an EncoderDecoder based Long-Short Term Memory (LSTM) model to more accuratelycapture the non-linear relationship between past and future viewport positions. This model takes the transforming data instead of taking the direct inputto predict the future user movement. Then, this prediction model is combinedwith a rate adaptation approach that assigns the bitrates to various tiles for360-degree video frames under a given network capacity. Hence, our proposedwork aims to facilitate improved system performance when QoE parametersare jointly optimized. Some experiments were carried out and compared withexisting work to prove the performance of the proposed model. Last but notleast, the experiments implementation of our proposed work provides highuser’s QoE than its competitors.
基金Fundamental Research Funds for the Central Universities(Grant No.FRF-TP-19-006A3).
文摘As a common and high-risk type of disease,heart disease seriously threatens people’s health.At the same time,in the era of the Internet of Thing(IoT),smart medical device has strong practical significance for medical workers and patients because of its ability to assist in the diagnosis of diseases.Therefore,the research of real-time diagnosis and classification algorithms for arrhythmia can help to improve the diagnostic efficiency of diseases.In this paper,we design an automatic arrhythmia classification algorithm model based on Convolutional Neural Network(CNN)and Encoder-Decoder model.The model uses Long Short-Term Memory(LSTM)to consider the influence of time series features on classification results.Simultaneously,it is trained and tested by the MIT-BIH arrhythmia database.Besides,Generative Adversarial Networks(GAN)is adopted as a method of data equalization for solving data imbalance problem.The simulation results show that for the inter-patient arrhythmia classification,the hybrid model combining CNN and Encoder-Decoder model has the best classification accuracy,of which the accuracy can reach 94.05%.Especially,it has a better advantage for the classification effect of supraventricular ectopic beats(class S)and fusion beats(class F).
基金support for this work are as follows:Ningxia Hui Autonomous Region Key Research and Development Program Project:Research and demonstration application of key technologies for intelligent monitoring of spatial planning based on high-scoring remote sensing(Project No.2018YBZD1629).
文摘Cultivated land extraction is essential for sustainable development and agriculture.In this paper,the network we propose is based on the encoder-decoder structure,which extracts the semantic segmentation neural network of cultivated land from satellite images and uses it for agricultural automation solutions.The encoder consists of two part:the first is the modified Xception,it can used as the feature extraction network,and the second is the atrous convolution,it can used to expand the receptive field and the context information to extract richer feature information.The decoder part uses the conventional upsampling operation to restore the original resolution.In addition,we use the combination of BCE and Loves-hinge as a loss function to optimize the Intersection over Union(IoU).Experimental results show that the proposed network structure can solve the problem of cultivated land extraction in Yinchuan City.
基金funded by the National Key Research and Development Program of China(Grant 2020YFB1708900)the Fundamental Research Funds for the Central Universities(Grant No.B220201044).
文摘Medical image segmentation has witnessed rapid advancements with the emergence of encoder-decoder based methods.In the encoder-decoder structure,the primary goal of the decoding phase is not only to restore feature map resolution,but also to mitigate the loss of feature information incurred during the encoding phase.However,this approach gives rise to a challenge:multiple up-sampling operations in the decoder segment result in the loss of feature information.To address this challenge,we propose a novel network that removes the decoding structure to reduce feature information loss(CBL-Net).In particular,we introduce a Parallel Pooling Module(PPM)to counteract the feature information loss stemming from conventional and pooling operations during the encoding stage.Furthermore,we incorporate a Multiplexed Dilation Convolution(MDC)module to expand the network's receptive field.Also,although we have removed the decoding stage,we still need to recover the feature map resolution.Therefore,we introduced the Global Feature Recovery(GFR)module.It uses attention mechanism for the image feature map resolution recovery,which can effectively reduce the loss of feature information.We conduct extensive experimental evaluations on three publicly available medical image segmentation datasets:DRIVE,CHASEDB and MoNuSeg datasets.Experimental results show that our proposed network outperforms state-of-the-art methods in medical image segmentation.In addition,it achieves higher efficiency than the current network of coding and decoding structures by eliminating the decoding component.
基金This work was funded by the Austrian COMET Program(project InTribology,No.872176)via the Austrian Research Promotion Agency(FFG)and the Provinces of Niederosterreich and Vorarlberg,and has been carried out within the Austrian Excellence Centre of Tribology(AC2T research GmbH).
文摘The existing knowledge regarding the interfacial forces,lubrication,and wear of bearings in real-world operation has significantly improved their designs over time,allowing for prolonged service life.As a result,self-lubricating bearings have become a viable alternative to traditional bearing designs in industrial machines.However,wear mechanisms are still inevitable and occur progressively in self-lubricating bearings,as characterized by the loss of the lubrication film and seizure.Therefore,monitoring the stages of the wear states in these components will help to impart the necessary countermeasures to reduce the machine maintenance downtime.This article proposes a methodology for using a long short-term memory(LSTM)-based encoder-decoder architecture on interfacial force signatures to detect abnormal regimes,aiming to provide early predictions of failure in self-lubricating sliding contacts even before they occur.Reciprocating sliding experiments were performed using a self-lubricating bronze bushing and steel shaft journal in a custom-built transversally oscillating tribometer setup.The force signatures corresponding to each cycle of the reciprocating sliding motion in the normal regime were used as inputs to train the encoder-decoder architecture,so as to reconstruct any new signal of the normal regime with the minimum error.With this semi-supervised training exercise,the force signatures corresponding to the abnormal regime could be differentiated from the normal regime,as their reconstruction errors would be very high.During the validation procedure for the proposed LSTM-based encoder-decoder model,the model predicted the force signals corresponding to the normal and abnormal regimes with an accuracy of 97%.In addition,a visualization of the reconstruction error across the entire force signature showed noticeable patterns in the reconstruction error when temporally decoded before the actual critical failure point,making it possible to be used for early predictions of failure.
文摘Accurate pedestrian trajectory predictions are critical in self-driving systems,as they are fundamental to the response-and decision-making of ego vehicles.In this study,we focus on the problem of predicting the future trajectory of pedestrians from a first-person perspective.Most existing trajectory prediction methods from the first-person view copy the bird’s-eye view,neglecting the differences between the two.To this end,we clarify the differences between the two views and highlight the importance of action-aware trajectory prediction in the first-person view.We propose a new action-aware network based on an encoder-decoder framework with an action prediction and a goal estimation branch at the end of the encoder.In the decoder part,bidirectional long short-term memory(Bi-LSTM)blocks are adopted to generate the ultimate prediction of pedestrians’future trajectories.Our method was evaluated on a public dataset and achieved a competitive performance,compared with other approaches.An ablation study demonstrates the effectiveness of the action prediction branch.
基金supported by Business Integration and Data Sharing Service Technology Based on Through Information of Operation and Distribution(2016 state Grid Technology Project)
文摘Anomaly detection in smart grid is critical to enhance the reliability of power systems. Excessive manpower has to be involved in analyzing the measurement data collected from intelligent motoring devices while performance of anomaly detection is still not satisfactory. This is mainly because the inherent spatio-temporality and multi-dimensionality of the measurement data cannot be easily captured. In this paper, we propose an anomaly detection model based on encoder-decoder framework with recurrent neural network (RNN). In the model, an input time series is reconstructed and an anomaly can be detected by an unexpected high reconstruction error. Both Manhattan distance and the edit distance are used to evaluate the difference between an input time series and its reconstructed one. Finally, we validate the proposed model by using power demand data from University of California, Riverside (UCR) time series classification archive and IEEE 39 bus system simulation data. Results from the analysis demonstrate that the proposed encoder-decoder framework is able to successfully capture anomalies with a precision higher than 95%.
基金supported in part by the National Natural Science Foundation of China under Grant 61873277in part by the Natural Science Basic Research Plan in Shaanxi Province of China underGrant 2020JQ-758in part by the Chinese Postdoctoral Science Foundation under Grant 2020M673446.
文摘In the video captioning methods based on an encoder-decoder,limited visual features are extracted by an encoder,and a natural sentence of the video content is generated using a decoder.However,this kind ofmethod is dependent on a single video input source and few visual labels,and there is a problem with semantic alignment between video contents and generated natural sentences,which are not suitable for accurately comprehending and describing the video contents.To address this issue,this paper proposes a video captioning method by semantic topic-guided generation.First,a 3D convolutional neural network is utilized to extract the spatiotemporal features of videos during the encoding.Then,the semantic topics of video data are extracted using the visual labels retrieved from similar video data.In the decoding,a decoder is constructed by combining a novel Enhance-TopK sampling algorithm with a Generative Pre-trained Transformer-2 deep neural network,which decreases the influence of“deviation”in the semantic mapping process between videos and texts by jointly decoding a baseline and semantic topics of video contents.During this process,the designed Enhance-TopK sampling algorithm can alleviate a long-tail problem by dynamically adjusting the probability distribution of the predicted words.Finally,the experiments are conducted on two publicly used Microsoft Research Video Description andMicrosoft Research-Video to Text datasets.The experimental results demonstrate that the proposed method outperforms several state-of-art approaches.Specifically,the performance indicators Bilingual Evaluation Understudy,Metric for Evaluation of Translation with Explicit Ordering,Recall Oriented Understudy for Gisting Evaluation-longest common subsequence,and Consensus-based Image Description Evaluation of the proposed method are improved by 1.2%,0.1%,0.3%,and 2.4% on the Microsoft Research Video Description dataset,and 0.1%,1.0%,0.1%,and 2.8% on the Microsoft Research-Video to Text dataset,respectively,compared with the existing video captioning methods.As a result,the proposed method can generate video captioning that is more closely aligned with human natural language expression habits.
文摘As the field of autonomous driving evolves, real-time semantic segmentation has become a crucial part of computer vision tasks. However, most existing methods use lightweight convolution to reduce the computational effort, resulting in lower accuracy. To address this problem, we construct TBANet, a network with an encoder-decoder structure for efficient feature extraction. In the encoder part, the TBA module is designed to extract details and the ETBA module is used to learn semantic representations in a high-dimensional space. In the decoder part, we design a combination of multiple upsampling methods to aggregate features with less computational overhead. We validate the efficiency of TBANet on the Cityscapes dataset. It achieves 75.1% mean Intersection over Union(mIoU) with only 2.07 million parameters and can reach 90.3 Frames Per Second(FPS).