The number of blogs and other forms of opinionated online content has increased dramatically in recent years.Many fields,including academia and national security,place an emphasis on automated political article orient...The number of blogs and other forms of opinionated online content has increased dramatically in recent years.Many fields,including academia and national security,place an emphasis on automated political article orientation detection.Political articles(especially in the Arab world)are different from other articles due to their subjectivity,in which the author’s beliefs and political affiliation might have a significant influence on a political article.With categories representing the main political ideologies,this problem may be thought of as a subset of the text categorization(classification).In general,the performance of machine learning models for text classification is sensitive to hyperparameter settings.Furthermore,the feature vector used to represent a document must capture,to some extent,the complex semantics of natural language.To this end,this paper presents an intelligent system to detect political Arabic article orientation that adapts the categorical boosting(CatBoost)method combined with a multi-level feature concept.Extracting features at multiple levels can enhance the model’s ability to discriminate between different classes or patterns.Each level may capture different aspects of the input data,contributing to a more comprehensive representation.CatBoost,a robust and efficient gradient-boosting algorithm,is utilized to effectively learn and predict the complex relationships between these features and the political orientation labels associated with the articles.A dataset of political Arabic texts collected from diverse sources,including postings and articles,is used to assess the suggested technique.Conservative,reform,and revolutionary are the three subcategories of these opinions.The results of this study demonstrate that compared to other frequently used machine learning models for text classification,the CatBoost method using multi-level features performs better with an accuracy of 98.14%.展开更多
In this paper we discuss policy iteration methods for approximate solution of a finite-state discounted Markov decision problem, with a focus on feature-based aggregation methods and their connection with deep reinfor...In this paper we discuss policy iteration methods for approximate solution of a finite-state discounted Markov decision problem, with a focus on feature-based aggregation methods and their connection with deep reinforcement learning schemes. We introduce features of the states of the original problem, and we formulate a smaller "aggregate" Markov decision problem, whose states relate to the features. We discuss properties and possible implementations of this type of aggregation, including a new approach to approximate policy iteration. In this approach the policy improvement operation combines feature-based aggregation with feature construction using deep neural networks or other calculations. We argue that the cost function of a policy may be approximated much more accurately by the nonlinear function of the features provided by aggregation, than by the linear function of the features provided by neural networkbased reinforcement learning, thereby potentially leading to more effective policy improvement.展开更多
Recently, there have been some attempts of Transformer in 3D point cloud classification. In order to reduce computations, most existing methods focus on local spatial attention,but ignore their content and fail to est...Recently, there have been some attempts of Transformer in 3D point cloud classification. In order to reduce computations, most existing methods focus on local spatial attention,but ignore their content and fail to establish relationships between distant but relevant points. To overcome the limitation of local spatial attention, we propose a point content-based Transformer architecture, called PointConT for short. It exploits the locality of points in the feature space(content-based), which clusters the sampled points with similar features into the same class and computes the self-attention within each class, thus enabling an effective trade-off between capturing long-range dependencies and computational complexity. We further introduce an inception feature aggregator for point cloud classification, which uses parallel structures to aggregate high-frequency and low-frequency information in each branch separately. Extensive experiments show that our PointConT model achieves a remarkable performance on point cloud shape classification. Especially, our method exhibits 90.3% Top-1 accuracy on the hardest setting of ScanObjectN N. Source code of this paper is available at https://github.com/yahuiliu99/PointC onT.展开更多
Estimation of velocity profile within mud depth is a long-standing and essential problem in debris flow dynamics.Until now,various velocity profiles have been proposed based on the fitting analysis of experimental mea...Estimation of velocity profile within mud depth is a long-standing and essential problem in debris flow dynamics.Until now,various velocity profiles have been proposed based on the fitting analysis of experimental measurements,but these are often limited by the observation conditions,such as the number of configured sensors.Therefore,the resulting linear velocity profiles usually exhibit limitations in reproducing the temporal-varied and nonlinear behavior during the debris flow process.In this study,we present a novel approach to explore the debris flow velocity profile in detail upon our previous 3D-HBPSPH numerical model,i.e.,the three-dimensional Smoothed Particle Hydrodynamic model incorporating the Herschel-Bulkley-Papanastasiou rheology.Specifically,we propose a stratification aggregation algorithm for interpreting the details of SPH particles,which enables the recording of temporal velocities of debris flow at different mud depths.To analyze the velocity profile,we introduce a logarithmic-based nonlinear model with two key parameters,that a controlling the shape of velocity profile and b concerning its temporal evolution.We verify the proposed velocity profile and explore its sensitivity using 34 sets of velocity data from three individual flume experiments in previous literature.Our results demonstrate that the proposed temporalvaried nonlinear velocity profile outperforms the previous linear profiles.展开更多
This article introduces the concept of load aggregation,which involves a comprehensive analysis of loads to acquire their external characteristics for the purpose of modeling and analyzing power systems.The online ide...This article introduces the concept of load aggregation,which involves a comprehensive analysis of loads to acquire their external characteristics for the purpose of modeling and analyzing power systems.The online identification method is a computer-involved approach for data collection,processing,and system identification,commonly used for adaptive control and prediction.This paper proposes a method for dynamically aggregating large-scale adjustable loads to support high proportions of new energy integration,aiming to study the aggregation characteristics of regional large-scale adjustable loads using online identification techniques and feature extraction methods.The experiment selected 300 central air conditioners as the research subject and analyzed their regulation characteristics,economic efficiency,and comfort.The experimental results show that as the adjustment time of the air conditioner increases from 5 minutes to 35 minutes,the stable adjustment quantity during the adjustment period decreases from 28.46 to 3.57,indicating that air conditioning loads can be controlled over a long period and have better adjustment effects in the short term.Overall,the experimental results of this paper demonstrate that analyzing the aggregation characteristics of regional large-scale adjustable loads using online identification techniques and feature extraction algorithms is effective.展开更多
Stance detection is the task of attitude identification toward a standpoint.Previous work of stance detection has focused on feature extraction but ignored the fact that irrelevant features exist as noise during highe...Stance detection is the task of attitude identification toward a standpoint.Previous work of stance detection has focused on feature extraction but ignored the fact that irrelevant features exist as noise during higher-level abstracting.Moreover,because the target is not always mentioned in the text,most methods have ignored target information.In order to solve these problems,we propose a neural network ensemble method that combines the timing dependence bases on long short-term memory(LSTM)and the excellent extracting performance of convolutional neural networks(CNNs).The method can obtain multi-level features that consider both local and global features.We also introduce attention mechanisms to magnify target information-related features.Furthermore,we employ sparse coding to remove noise to obtain characteristic features.Performance was improved by using sparse coding on the basis of attention employment and feature extraction.We evaluate our approach on the SemEval-2016Task 6-A public dataset,achieving a performance that exceeds the benchmark and those of participating teams.展开更多
As an important part of the new generation of information technology,the Internet of Things(IoT)has been widely concerned and regarded as an enabling technology of the next generation of health care system.The fundus ...As an important part of the new generation of information technology,the Internet of Things(IoT)has been widely concerned and regarded as an enabling technology of the next generation of health care system.The fundus photography equipment is connected to the cloud platform through the IoT,so as to realize the realtime uploading of fundus images and the rapid issuance of diagnostic suggestions by artificial intelligence.At the same time,important security and privacy issues have emerged.The data uploaded to the cloud platform involves more personal attributes,health status and medical application data of patients.Once leaked,abused or improperly disclosed,personal information security will be violated.Therefore,it is important to address the security and privacy issues of massive medical and healthcare equipment connecting to the infrastructure of IoT healthcare and health systems.To meet this challenge,we propose MIA-UNet,a multi-scale iterative aggregation U-network,which aims to achieve accurate and efficient retinal vessel segmentation for ophthalmic auxiliary diagnosis while ensuring that the network has low computational complexity to adapt to mobile terminals.In this way,users do not need to upload the data to the cloud platform,and can analyze and process the fundus images on their own mobile terminals,thus eliminating the leakage of personal information.Specifically,the interconnection between encoder and decoder,as well as the internal connection between decoder subnetworks in classic U-Net are redefined and redesigned.Furthermore,we propose a hybrid loss function to smooth the gradient and deal with the imbalance between foreground and background.Compared with the UNet,the segmentation performance of the proposed network is significantly improved on the premise that the number of parameters is only increased by 2%.When applied to three publicly available datasets:DRIVE,STARE and CHASE DB1,the proposed network achieves the accuracy/F1-score of 96.33%/84.34%,97.12%/83.17%and 97.06%/84.10%,respectively.The experimental results show that the MIA-UNet is superior to the state-of-the-art methods.展开更多
Traffic scene captioning technology automatically generates one or more sentences to describe the content of traffic scenes by analyzing the content of the input traffic scene images,ensuring road safety while providi...Traffic scene captioning technology automatically generates one or more sentences to describe the content of traffic scenes by analyzing the content of the input traffic scene images,ensuring road safety while providing an important decision-making function for sustainable transportation.In order to provide a comprehensive and reasonable description of complex traffic scenes,a traffic scene semantic captioningmodel withmulti-stage feature enhancement is proposed in this paper.In general,the model follows an encoder-decoder structure.First,multilevel granularity visual features are used for feature enhancement during the encoding process,which enables the model to learn more detailed content in the traffic scene image.Second,the scene knowledge graph is applied to the decoding process,and the semantic features provided by the scene knowledge graph are used to enhance the features learned by the decoder again,so that themodel can learn the attributes of objects in the traffic scene and the relationships between objects to generate more reasonable captions.This paper reports extensive experiments on the challenging MS-COCO dataset,evaluated by five standard automatic evaluation metrics,and the results show that the proposed model has improved significantly in all metrics compared with the state-of-the-art methods,especially achieving a score of 129.0 on the CIDEr-D evaluation metric,which also indicates that the proposed model can effectively provide a more reasonable and comprehensive description of the traffic scene.展开更多
As an indispensable part of identity authentication,offline writer identification plays a notable role in biology,forensics,and historical document analysis.However,identifying handwriting efficiently,stably,and quick...As an indispensable part of identity authentication,offline writer identification plays a notable role in biology,forensics,and historical document analysis.However,identifying handwriting efficiently,stably,and quickly is still challenging due to the method of extracting and processing handwriting features.In this paper,we propose an efficient system to identify writers through handwritten images,which integrates local and global features from similar handwritten images.The local features are modeled by effective aggregate processing,and global features are extracted through transfer learning.Specifically,the proposed system employs a pre-trained Residual Network to mine the relationship between large image sets and specific handwritten images,while the vector of locally aggregated descriptors with double power normalization is employed in aggregating local and global features.Moreover,handwritten image segmentation,preprocessing,enhancement,optimization of neural network architecture,and normalization for local and global features are exploited,significantly improving system performance.The proposed system is evaluated on Computer Vision Lab(CVL)datasets and the International Conference on Document Analysis and Recognition(ICDAR)2013 datasets.The results show that it represents good generalizability and achieves state-of-the-art performance.Furthermore,the system performs better when training complete handwriting patches with the normalization method.The experimental result indicates that it’s significant to segment handwriting reasonably while dealing with handwriting overlap,which reduces visual burstiness.展开更多
Existing glass segmentation networks have high computational complexity and large memory occupation,leading to high hardware requirements and time overheads for model inference,which is not conducive to efficiency-see...Existing glass segmentation networks have high computational complexity and large memory occupation,leading to high hardware requirements and time overheads for model inference,which is not conducive to efficiency-seeking real-time tasks such as autonomous driving.The inefficiency of the models is mainly due to employing homogeneous modules to process features of different layers.These modules require computationally intensive convolutions and weight calculation branches with numerous parameters to accommodate the differences in information across layers.We propose an efficient glass segmentation network(EGSNet)based on multi-level heterogeneous architecture and boundary awareness to balance the model performance and efficiency.EGSNet divides the feature layers from different stages into low-level understanding,semantic-level understanding,and global understanding with boundary guidance.Based on the information differences among the different layers,we further propose the multi-angle collaborative enhancement(MCE)module,which extracts the detailed information from shallow features,and the large-scale contextual feature extraction(LCFE)module to understand semantic logic through deep features.The models are trained and evaluated on the glass segmentation datasets HSO(Home-Scene-Oriented)and Trans10k-stuff,respectively,and EGSNet achieves the best efficiency and performance compared to advanced methods.In the HSO test set results,the IoU,Fβ,MAE(Mean Absolute Error),and BER(Balance Error Rate)of EGSNet are 0.804,0.847,0.084,and 0.085,and the GFLOPs(Giga Floating Point Operations Per Second)are only 27.15.Experimental results show that EGSNet significantly improves the efficiency of the glass segmentation task with better performance.展开更多
作为高铁牵引供电系统的重要组成部分,接触网系统承担着向动车组传输电能的重要功能.实际工程运营表明,受弓网交互产生的持续冲击以及外部环境的影响,接触网支撑部件可能会出现“松、脱、断、裂”等缺陷,导致接触网结构可靠性下降,严重...作为高铁牵引供电系统的重要组成部分,接触网系统承担着向动车组传输电能的重要功能.实际工程运营表明,受弓网交互产生的持续冲击以及外部环境的影响,接触网支撑部件可能会出现“松、脱、断、裂”等缺陷,导致接触网结构可靠性下降,严重影响接触网系统稳定运行.因此,及时精确定位接触网支撑部件(CSCs),对保障高铁安全运行和完善接触网检修维护策略具有重大意义.然而,CSCs的检测通常面临着零部件种类多、尺度差异大、部分零部件微小的问题.针对以上问题,本文提出一种基于多尺度融合金字塔焦点网络的接触网零部件检测算法,将平衡模块和特征金字塔模块相结合,提高对小目标的检测性能.首先,设计了可分离残差金字塔聚合模块(SRPAM),用于优化模型多尺度特征提取能力、扩大感受野,缓解CSCs检测的多尺度问题;其次,设计了一种基于平衡特征金字塔的路径聚合网络(PA-BFPN),用于提升跨层特征融合效率和小目标检测性能.最后,通过对比试验、可视化实验和消融实验证明了所提方法的有效性和优越性.其中,所提的MFPFCOS在CSCs数据集上的检测精度(mAP)能够在达到48.6%的同时,实现30的FLOPs(Floating point operations per second),表明所提方法能够在检测精度和检测速度之间保持良好的平衡.展开更多
Scene perception and trajectory forecasting are two fundamental challenges that are crucial to a safe and reliable autonomous driving(AD)system.However,most proposed methods aim at addressing one of the two challenges...Scene perception and trajectory forecasting are two fundamental challenges that are crucial to a safe and reliable autonomous driving(AD)system.However,most proposed methods aim at addressing one of the two challenges mentioned above with a single model.To tackle this dilemma,this paper proposes spatio-temporal semantics and interaction graph aggregation for multi-agent perception and trajectory forecasting(STSIGMA),an efficient end-to-end method to jointly and accurately perceive the AD environment and forecast the trajectories of the surrounding traffic agents within a unified framework.ST-SIGMA adopts a trident encoder-decoder architecture to learn scene semantics and agent interaction information on bird’s-eye view(BEV)maps simultaneously.Specifically,an iterative aggregation network is first employed as the scene semantic encoder(SSE)to learn diverse scene information.To preserve dynamic interactions of traffic agents,ST-SIGMA further exploits a spatio-temporal graph network as the graph interaction encoder.Meanwhile,a simple yet efficient feature fusion method to fuse semantic and interaction features into a unified feature space as the input to a novel hierarchical aggregation decoder for downstream prediction tasks is designed.Extensive experiments on the nuScenes data set have demonstrated that the proposed ST-SIGMA achieves significant improvements compared to the state-of-theart(SOTA)methods in terms of scene perception and trajectory forecasting,respectively.Therefore,the proposed approach outperforms SOTA in terms of model generalisation and robustness and is therefore more feasible for deployment in realworld AD scenarios.展开更多
Space-time video super-resolution(STVSR)serves the purpose to reconstruct high-resolution high-frame-rate videos from their low-resolution low-frame-rate counterparts.Recent approaches utilize end-to-end deep learning...Space-time video super-resolution(STVSR)serves the purpose to reconstruct high-resolution high-frame-rate videos from their low-resolution low-frame-rate counterparts.Recent approaches utilize end-to-end deep learning models to achieve STVSR.They first interpolate intermediate frame features between given frames,then perform local and global refinement among the feature sequence,and finally increase the spatial resolutions of these features.However,in the most important feature interpolation phase,they only capture spatial-temporal information from the most adjacent frame features,ignoring modelling long-term spatial-temporal correlations between multiple neighbouring frames to restore variable-speed object movements and maintain long-term motion continuity.In this paper,we propose a novel long-term temporal feature aggregation network(LTFA-Net)for STVSR.Specifically,we design a long-term mixture of experts(LTMoE)module for feature interpolation.LTMoE contains multiple experts to extract mutual and complementary spatial-temporal information from multiple consecutive adjacent frame features,which are then combined with different weights to obtain interpolation results using several gating nets.Next,we perform local and global feature refinement using the Locally-temporal Feature Comparison(LFC)module and bidirectional deformable ConvLSTM layer,respectively.Experimental results on two standard benchmarks,Adobe240 and GoPro,indicate the effectiveness and superiority of our approach over state of the art.展开更多
文摘The number of blogs and other forms of opinionated online content has increased dramatically in recent years.Many fields,including academia and national security,place an emphasis on automated political article orientation detection.Political articles(especially in the Arab world)are different from other articles due to their subjectivity,in which the author’s beliefs and political affiliation might have a significant influence on a political article.With categories representing the main political ideologies,this problem may be thought of as a subset of the text categorization(classification).In general,the performance of machine learning models for text classification is sensitive to hyperparameter settings.Furthermore,the feature vector used to represent a document must capture,to some extent,the complex semantics of natural language.To this end,this paper presents an intelligent system to detect political Arabic article orientation that adapts the categorical boosting(CatBoost)method combined with a multi-level feature concept.Extracting features at multiple levels can enhance the model’s ability to discriminate between different classes or patterns.Each level may capture different aspects of the input data,contributing to a more comprehensive representation.CatBoost,a robust and efficient gradient-boosting algorithm,is utilized to effectively learn and predict the complex relationships between these features and the political orientation labels associated with the articles.A dataset of political Arabic texts collected from diverse sources,including postings and articles,is used to assess the suggested technique.Conservative,reform,and revolutionary are the three subcategories of these opinions.The results of this study demonstrate that compared to other frequently used machine learning models for text classification,the CatBoost method using multi-level features performs better with an accuracy of 98.14%.
文摘In this paper we discuss policy iteration methods for approximate solution of a finite-state discounted Markov decision problem, with a focus on feature-based aggregation methods and their connection with deep reinforcement learning schemes. We introduce features of the states of the original problem, and we formulate a smaller "aggregate" Markov decision problem, whose states relate to the features. We discuss properties and possible implementations of this type of aggregation, including a new approach to approximate policy iteration. In this approach the policy improvement operation combines feature-based aggregation with feature construction using deep neural networks or other calculations. We argue that the cost function of a policy may be approximated much more accurately by the nonlinear function of the features provided by aggregation, than by the linear function of the features provided by neural networkbased reinforcement learning, thereby potentially leading to more effective policy improvement.
基金supported in part by the Nationa Natural Science Foundation of China (61876011)the National Key Research and Development Program of China (2022YFB4703700)+1 种基金the Key Research and Development Program 2020 of Guangzhou (202007050002)the Key-Area Research and Development Program of Guangdong Province (2020B090921003)。
文摘Recently, there have been some attempts of Transformer in 3D point cloud classification. In order to reduce computations, most existing methods focus on local spatial attention,but ignore their content and fail to establish relationships between distant but relevant points. To overcome the limitation of local spatial attention, we propose a point content-based Transformer architecture, called PointConT for short. It exploits the locality of points in the feature space(content-based), which clusters the sampled points with similar features into the same class and computes the self-attention within each class, thus enabling an effective trade-off between capturing long-range dependencies and computational complexity. We further introduce an inception feature aggregator for point cloud classification, which uses parallel structures to aggregate high-frequency and low-frequency information in each branch separately. Extensive experiments show that our PointConT model achieves a remarkable performance on point cloud shape classification. Especially, our method exhibits 90.3% Top-1 accuracy on the hardest setting of ScanObjectN N. Source code of this paper is available at https://github.com/yahuiliu99/PointC onT.
基金supported by the National Natural Science Foundation of China(Grant No.52078493)the Natural Science Foundation of Hunan Province(Grant No.2022JJ30700)+2 种基金the Natural Science Foundation for Excellent Young Scholars of Hunan(Grant No.2021JJ20057)the Science and Technology Plan Project of Changsha(Grant No.kq2305006)the Innovation Driven Program of Central South University(Grant No.2023CXQD033).
文摘Estimation of velocity profile within mud depth is a long-standing and essential problem in debris flow dynamics.Until now,various velocity profiles have been proposed based on the fitting analysis of experimental measurements,but these are often limited by the observation conditions,such as the number of configured sensors.Therefore,the resulting linear velocity profiles usually exhibit limitations in reproducing the temporal-varied and nonlinear behavior during the debris flow process.In this study,we present a novel approach to explore the debris flow velocity profile in detail upon our previous 3D-HBPSPH numerical model,i.e.,the three-dimensional Smoothed Particle Hydrodynamic model incorporating the Herschel-Bulkley-Papanastasiou rheology.Specifically,we propose a stratification aggregation algorithm for interpreting the details of SPH particles,which enables the recording of temporal velocities of debris flow at different mud depths.To analyze the velocity profile,we introduce a logarithmic-based nonlinear model with two key parameters,that a controlling the shape of velocity profile and b concerning its temporal evolution.We verify the proposed velocity profile and explore its sensitivity using 34 sets of velocity data from three individual flume experiments in previous literature.Our results demonstrate that the proposed temporalvaried nonlinear velocity profile outperforms the previous linear profiles.
基金supported by the State Grid Science&Technology Project(5100-202114296A-0-0-00).
文摘This article introduces the concept of load aggregation,which involves a comprehensive analysis of loads to acquire their external characteristics for the purpose of modeling and analyzing power systems.The online identification method is a computer-involved approach for data collection,processing,and system identification,commonly used for adaptive control and prediction.This paper proposes a method for dynamically aggregating large-scale adjustable loads to support high proportions of new energy integration,aiming to study the aggregation characteristics of regional large-scale adjustable loads using online identification techniques and feature extraction methods.The experiment selected 300 central air conditioners as the research subject and analyzed their regulation characteristics,economic efficiency,and comfort.The experimental results show that as the adjustment time of the air conditioner increases from 5 minutes to 35 minutes,the stable adjustment quantity during the adjustment period decreases from 28.46 to 3.57,indicating that air conditioning loads can be controlled over a long period and have better adjustment effects in the short term.Overall,the experimental results of this paper demonstrate that analyzing the aggregation characteristics of regional large-scale adjustable loads using online identification techniques and feature extraction algorithms is effective.
基金This work is supported by the Fundamental Research Funds for the Central Universities(Grant No.2572019BH03).
文摘Stance detection is the task of attitude identification toward a standpoint.Previous work of stance detection has focused on feature extraction but ignored the fact that irrelevant features exist as noise during higher-level abstracting.Moreover,because the target is not always mentioned in the text,most methods have ignored target information.In order to solve these problems,we propose a neural network ensemble method that combines the timing dependence bases on long short-term memory(LSTM)and the excellent extracting performance of convolutional neural networks(CNNs).The method can obtain multi-level features that consider both local and global features.We also introduce attention mechanisms to magnify target information-related features.Furthermore,we employ sparse coding to remove noise to obtain characteristic features.Performance was improved by using sparse coding on the basis of attention employment and feature extraction.We evaluate our approach on the SemEval-2016Task 6-A public dataset,achieving a performance that exceeds the benchmark and those of participating teams.
基金This work was supported in part by the National Natural Science Foundation of China(Nos.62072074,62076054,62027827,61902054)the Frontier Science and Technology Innovation Projects of National Key R&D Program(No.2019QY1405)+2 种基金the Sichuan Science and Technology Innovation Platform and Talent Plan(No.2020JDJQ0020)the Sichuan Science and Technology Support Plan(No.2020YFSY0010)the Natural Science Foundation of Guangdong Province(No.2018A030313354).
文摘As an important part of the new generation of information technology,the Internet of Things(IoT)has been widely concerned and regarded as an enabling technology of the next generation of health care system.The fundus photography equipment is connected to the cloud platform through the IoT,so as to realize the realtime uploading of fundus images and the rapid issuance of diagnostic suggestions by artificial intelligence.At the same time,important security and privacy issues have emerged.The data uploaded to the cloud platform involves more personal attributes,health status and medical application data of patients.Once leaked,abused or improperly disclosed,personal information security will be violated.Therefore,it is important to address the security and privacy issues of massive medical and healthcare equipment connecting to the infrastructure of IoT healthcare and health systems.To meet this challenge,we propose MIA-UNet,a multi-scale iterative aggregation U-network,which aims to achieve accurate and efficient retinal vessel segmentation for ophthalmic auxiliary diagnosis while ensuring that the network has low computational complexity to adapt to mobile terminals.In this way,users do not need to upload the data to the cloud platform,and can analyze and process the fundus images on their own mobile terminals,thus eliminating the leakage of personal information.Specifically,the interconnection between encoder and decoder,as well as the internal connection between decoder subnetworks in classic U-Net are redefined and redesigned.Furthermore,we propose a hybrid loss function to smooth the gradient and deal with the imbalance between foreground and background.Compared with the UNet,the segmentation performance of the proposed network is significantly improved on the premise that the number of parameters is only increased by 2%.When applied to three publicly available datasets:DRIVE,STARE and CHASE DB1,the proposed network achieves the accuracy/F1-score of 96.33%/84.34%,97.12%/83.17%and 97.06%/84.10%,respectively.The experimental results show that the MIA-UNet is superior to the state-of-the-art methods.
基金funded by(i)Natural Science Foundation China(NSFC)under Grant Nos.61402397,61263043,61562093 and 61663046(ii)Open Foundation of Key Laboratory in Software Engineering of Yunnan Province:No.2020SE304.(iii)Practical Innovation Project of Yunnan University,Project Nos.2021z34,2021y128 and 2021y129.
文摘Traffic scene captioning technology automatically generates one or more sentences to describe the content of traffic scenes by analyzing the content of the input traffic scene images,ensuring road safety while providing an important decision-making function for sustainable transportation.In order to provide a comprehensive and reasonable description of complex traffic scenes,a traffic scene semantic captioningmodel withmulti-stage feature enhancement is proposed in this paper.In general,the model follows an encoder-decoder structure.First,multilevel granularity visual features are used for feature enhancement during the encoding process,which enables the model to learn more detailed content in the traffic scene image.Second,the scene knowledge graph is applied to the decoding process,and the semantic features provided by the scene knowledge graph are used to enhance the features learned by the decoder again,so that themodel can learn the attributes of objects in the traffic scene and the relationships between objects to generate more reasonable captions.This paper reports extensive experiments on the challenging MS-COCO dataset,evaluated by five standard automatic evaluation metrics,and the results show that the proposed model has improved significantly in all metrics compared with the state-of-the-art methods,especially achieving a score of 129.0 on the CIDEr-D evaluation metric,which also indicates that the proposed model can effectively provide a more reasonable and comprehensive description of the traffic scene.
基金supported in part by the Postgraduate Research&Practice Innovation Program of Jiangsu Province under Grant KYCX 20_0758in part by the Science and Technology Research Project of Jiangsu Public Security Department under Grant 2020KX005+1 种基金in part by the General Project of Philosophy and Social Science Research in Colleges and Universities in Jiangsu Province under Grant 2022SJYB0473in part by“Cyberspace Security”Construction Project of Jiangsu Provincial Key Discipline during the“14th Five Year Plan”.
文摘As an indispensable part of identity authentication,offline writer identification plays a notable role in biology,forensics,and historical document analysis.However,identifying handwriting efficiently,stably,and quickly is still challenging due to the method of extracting and processing handwriting features.In this paper,we propose an efficient system to identify writers through handwritten images,which integrates local and global features from similar handwritten images.The local features are modeled by effective aggregate processing,and global features are extracted through transfer learning.Specifically,the proposed system employs a pre-trained Residual Network to mine the relationship between large image sets and specific handwritten images,while the vector of locally aggregated descriptors with double power normalization is employed in aggregating local and global features.Moreover,handwritten image segmentation,preprocessing,enhancement,optimization of neural network architecture,and normalization for local and global features are exploited,significantly improving system performance.The proposed system is evaluated on Computer Vision Lab(CVL)datasets and the International Conference on Document Analysis and Recognition(ICDAR)2013 datasets.The results show that it represents good generalizability and achieves state-of-the-art performance.Furthermore,the system performs better when training complete handwriting patches with the normalization method.The experimental result indicates that it’s significant to segment handwriting reasonably while dealing with handwriting overlap,which reduces visual burstiness.
文摘Existing glass segmentation networks have high computational complexity and large memory occupation,leading to high hardware requirements and time overheads for model inference,which is not conducive to efficiency-seeking real-time tasks such as autonomous driving.The inefficiency of the models is mainly due to employing homogeneous modules to process features of different layers.These modules require computationally intensive convolutions and weight calculation branches with numerous parameters to accommodate the differences in information across layers.We propose an efficient glass segmentation network(EGSNet)based on multi-level heterogeneous architecture and boundary awareness to balance the model performance and efficiency.EGSNet divides the feature layers from different stages into low-level understanding,semantic-level understanding,and global understanding with boundary guidance.Based on the information differences among the different layers,we further propose the multi-angle collaborative enhancement(MCE)module,which extracts the detailed information from shallow features,and the large-scale contextual feature extraction(LCFE)module to understand semantic logic through deep features.The models are trained and evaluated on the glass segmentation datasets HSO(Home-Scene-Oriented)and Trans10k-stuff,respectively,and EGSNet achieves the best efficiency and performance compared to advanced methods.In the HSO test set results,the IoU,Fβ,MAE(Mean Absolute Error),and BER(Balance Error Rate)of EGSNet are 0.804,0.847,0.084,and 0.085,and the GFLOPs(Giga Floating Point Operations Per Second)are only 27.15.Experimental results show that EGSNet significantly improves the efficiency of the glass segmentation task with better performance.
文摘作为高铁牵引供电系统的重要组成部分,接触网系统承担着向动车组传输电能的重要功能.实际工程运营表明,受弓网交互产生的持续冲击以及外部环境的影响,接触网支撑部件可能会出现“松、脱、断、裂”等缺陷,导致接触网结构可靠性下降,严重影响接触网系统稳定运行.因此,及时精确定位接触网支撑部件(CSCs),对保障高铁安全运行和完善接触网检修维护策略具有重大意义.然而,CSCs的检测通常面临着零部件种类多、尺度差异大、部分零部件微小的问题.针对以上问题,本文提出一种基于多尺度融合金字塔焦点网络的接触网零部件检测算法,将平衡模块和特征金字塔模块相结合,提高对小目标的检测性能.首先,设计了可分离残差金字塔聚合模块(SRPAM),用于优化模型多尺度特征提取能力、扩大感受野,缓解CSCs检测的多尺度问题;其次,设计了一种基于平衡特征金字塔的路径聚合网络(PA-BFPN),用于提升跨层特征融合效率和小目标检测性能.最后,通过对比试验、可视化实验和消融实验证明了所提方法的有效性和优越性.其中,所提的MFPFCOS在CSCs数据集上的检测精度(mAP)能够在达到48.6%的同时,实现30的FLOPs(Floating point operations per second),表明所提方法能够在检测精度和检测速度之间保持良好的平衡.
基金Basic and Advanced Research Projects of CSTC,Grant/Award Number:cstc2019jcyj-zdxmX0008Science and Technology Research Program of Chongqing Municipal Education Commission,Grant/Award Numbers:KJQN202100634,KJZDK201900605National Natural Science Foundation of China,Grant/Award Number:62006065。
文摘Scene perception and trajectory forecasting are two fundamental challenges that are crucial to a safe and reliable autonomous driving(AD)system.However,most proposed methods aim at addressing one of the two challenges mentioned above with a single model.To tackle this dilemma,this paper proposes spatio-temporal semantics and interaction graph aggregation for multi-agent perception and trajectory forecasting(STSIGMA),an efficient end-to-end method to jointly and accurately perceive the AD environment and forecast the trajectories of the surrounding traffic agents within a unified framework.ST-SIGMA adopts a trident encoder-decoder architecture to learn scene semantics and agent interaction information on bird’s-eye view(BEV)maps simultaneously.Specifically,an iterative aggregation network is first employed as the scene semantic encoder(SSE)to learn diverse scene information.To preserve dynamic interactions of traffic agents,ST-SIGMA further exploits a spatio-temporal graph network as the graph interaction encoder.Meanwhile,a simple yet efficient feature fusion method to fuse semantic and interaction features into a unified feature space as the input to a novel hierarchical aggregation decoder for downstream prediction tasks is designed.Extensive experiments on the nuScenes data set have demonstrated that the proposed ST-SIGMA achieves significant improvements compared to the state-of-theart(SOTA)methods in terms of scene perception and trajectory forecasting,respectively.Therefore,the proposed approach outperforms SOTA in terms of model generalisation and robustness and is therefore more feasible for deployment in realworld AD scenarios.
文摘Space-time video super-resolution(STVSR)serves the purpose to reconstruct high-resolution high-frame-rate videos from their low-resolution low-frame-rate counterparts.Recent approaches utilize end-to-end deep learning models to achieve STVSR.They first interpolate intermediate frame features between given frames,then perform local and global refinement among the feature sequence,and finally increase the spatial resolutions of these features.However,in the most important feature interpolation phase,they only capture spatial-temporal information from the most adjacent frame features,ignoring modelling long-term spatial-temporal correlations between multiple neighbouring frames to restore variable-speed object movements and maintain long-term motion continuity.In this paper,we propose a novel long-term temporal feature aggregation network(LTFA-Net)for STVSR.Specifically,we design a long-term mixture of experts(LTMoE)module for feature interpolation.LTMoE contains multiple experts to extract mutual and complementary spatial-temporal information from multiple consecutive adjacent frame features,which are then combined with different weights to obtain interpolation results using several gating nets.Next,we perform local and global feature refinement using the Locally-temporal Feature Comparison(LFC)module and bidirectional deformable ConvLSTM layer,respectively.Experimental results on two standard benchmarks,Adobe240 and GoPro,indicate the effectiveness and superiority of our approach over state of the art.