期刊文献+
共找到8篇文章
< 1 >
每页显示 20 50 100
Point Cloud Classification Using Content-Based Transformer via Clustering in Feature Space 被引量:2
1
作者 Yahui Liu Bin Tian +2 位作者 Yisheng Lv Lingxi Li Fei-Yue Wang 《IEEE/CAA Journal of Automatica Sinica》 SCIE EI CSCD 2024年第1期231-239,共9页
Recently, there have been some attempts of Transformer in 3D point cloud classification. In order to reduce computations, most existing methods focus on local spatial attention,but ignore their content and fail to est... Recently, there have been some attempts of Transformer in 3D point cloud classification. In order to reduce computations, most existing methods focus on local spatial attention,but ignore their content and fail to establish relationships between distant but relevant points. To overcome the limitation of local spatial attention, we propose a point content-based Transformer architecture, called PointConT for short. It exploits the locality of points in the feature space(content-based), which clusters the sampled points with similar features into the same class and computes the self-attention within each class, thus enabling an effective trade-off between capturing long-range dependencies and computational complexity. We further introduce an inception feature aggregator for point cloud classification, which uses parallel structures to aggregate high-frequency and low-frequency information in each branch separately. Extensive experiments show that our PointConT model achieves a remarkable performance on point cloud shape classification. Especially, our method exhibits 90.3% Top-1 accuracy on the hardest setting of ScanObjectN N. Source code of this paper is available at https://github.com/yahuiliu99/PointC onT. 展开更多
关键词 Content-based Transformer deep learning feature aggregator local attention point cloud classification
下载PDF
Adequate alignment and interaction for cross-modal retrieval
2
作者 Mingkang WANG Min MENG +1 位作者 Jigang LIU Jigang WU 《Virtual Reality & Intelligent Hardware》 EI 2023年第6期509-522,共14页
Background Cross-modal retrieval has attracted widespread attention in many cross-media similarity search applications,particularly image-text retrieval in the fields of computer vision and natural language processing... Background Cross-modal retrieval has attracted widespread attention in many cross-media similarity search applications,particularly image-text retrieval in the fields of computer vision and natural language processing.Recently,visual and semantic embedding(VSE)learning has shown promising improvements in image text retrieval tasks.Most existing VSE models employ two unrelated encoders to extract features and then use complex methods to contextualize and aggregate these features into holistic embeddings.Despite recent advances,existing approaches still suffer from two limitations:(1)without considering intermediate interactions and adequate alignment between different modalities,these models cannot guarantee the discriminative ability of representations;and(2)existing feature aggregators are susceptible to certain noisy regions,which may lead to unreasonable pooling coefficients and affect the quality of the final aggregated features.Methods To address these challenges,we propose a novel cross-modal retrieval model containing a well-designed alignment module and a novel multimodal fusion encoder that aims to learn the adequate alignment and interaction of aggregated features to effectively bridge the modality gap.Results Experiments on the Microsoft COCO and Flickr30k datasets demonstrated the superiority of our model over state-of-the-art methods. 展开更多
关键词 Cross-modal retrieval Visual semantic embedding feature aggregation Transformer
下载PDF
MIA-UNet:Multi-Scale Iterative Aggregation U-Network for Retinal Vessel Segmentation 被引量:2
3
作者 Linfang Yu Zhen Qin +1 位作者 Yi Ding Zhiguang Qin 《Computer Modeling in Engineering & Sciences》 SCIE EI 2021年第11期805-828,共24页
As an important part of the new generation of information technology,the Internet of Things(IoT)has been widely concerned and regarded as an enabling technology of the next generation of health care system.The fundus ... As an important part of the new generation of information technology,the Internet of Things(IoT)has been widely concerned and regarded as an enabling technology of the next generation of health care system.The fundus photography equipment is connected to the cloud platform through the IoT,so as to realize the realtime uploading of fundus images and the rapid issuance of diagnostic suggestions by artificial intelligence.At the same time,important security and privacy issues have emerged.The data uploaded to the cloud platform involves more personal attributes,health status and medical application data of patients.Once leaked,abused or improperly disclosed,personal information security will be violated.Therefore,it is important to address the security and privacy issues of massive medical and healthcare equipment connecting to the infrastructure of IoT healthcare and health systems.To meet this challenge,we propose MIA-UNet,a multi-scale iterative aggregation U-network,which aims to achieve accurate and efficient retinal vessel segmentation for ophthalmic auxiliary diagnosis while ensuring that the network has low computational complexity to adapt to mobile terminals.In this way,users do not need to upload the data to the cloud platform,and can analyze and process the fundus images on their own mobile terminals,thus eliminating the leakage of personal information.Specifically,the interconnection between encoder and decoder,as well as the internal connection between decoder subnetworks in classic U-Net are redefined and redesigned.Furthermore,we propose a hybrid loss function to smooth the gradient and deal with the imbalance between foreground and background.Compared with the UNet,the segmentation performance of the proposed network is significantly improved on the premise that the number of parameters is only increased by 2%.When applied to three publicly available datasets:DRIVE,STARE and CHASE DB1,the proposed network achieves the accuracy/F1-score of 96.33%/84.34%,97.12%/83.17%and 97.06%/84.10%,respectively.The experimental results show that the MIA-UNet is superior to the state-of-the-art methods. 展开更多
关键词 Retinal vessel segmentation security and privacy redesigned skip connection feature maps aggregation hybrid loss function
下载PDF
A Study of Ca-Mg Silicate Crystalline Glazes——An Analysis on Forms of Crystals
4
作者 LIUPei-de YUPing-li WUJi-huai 《Chemical Research in Chinese Universities》 SCIE CAS CSCD 2004年第2期200-204,共5页
In the study on Ca-Mg silicate crystalline glazes, we found some disequilibrated crystallization phenomena, such as non-crystallographic small angle forking and spheroidal growth, parasitism and wedging-form of crysta... In the study on Ca-Mg silicate crystalline glazes, we found some disequilibrated crystallization phenomena, such as non-crystallographic small angle forking and spheroidal growth, parasitism and wedging-form of crystals, dendritic growth, secondary nucleation, etc. Those phenomena possibly resulted from two factors: (1) partial temperature gradient, which is caused by heat asymmetry in the electrical resistance furnace, when crystals crystalize from silicate melt; (2) constitutional supercooling near the surface of crystals. The disparity of disequilibrated crystallization phenomena in different main crystalline phases causes various morphological features of the crystal aggregates. At the same time, disequilibrated crystallization causes great stress retained in the crystals, which results in cracks in glazes when the temperature drops. According to the results, the authors analyzed those phenomena and displayed correlative figures and data. 展开更多
关键词 Crystalline glaze Costitutional supercooling Heat dynamical condition Disequilibrated crystallization Morphological feature of crystal aggregates
下载PDF
Space-time video super-resolution using long-term temporal feature aggregation
5
作者 Kuanhao Chen Zijie Yue Miaojing Shi 《Autonomous Intelligent Systems》 EI 2023年第1期75-83,共9页
Space-time video super-resolution(STVSR)serves the purpose to reconstruct high-resolution high-frame-rate videos from their low-resolution low-frame-rate counterparts.Recent approaches utilize end-to-end deep learning... Space-time video super-resolution(STVSR)serves the purpose to reconstruct high-resolution high-frame-rate videos from their low-resolution low-frame-rate counterparts.Recent approaches utilize end-to-end deep learning models to achieve STVSR.They first interpolate intermediate frame features between given frames,then perform local and global refinement among the feature sequence,and finally increase the spatial resolutions of these features.However,in the most important feature interpolation phase,they only capture spatial-temporal information from the most adjacent frame features,ignoring modelling long-term spatial-temporal correlations between multiple neighbouring frames to restore variable-speed object movements and maintain long-term motion continuity.In this paper,we propose a novel long-term temporal feature aggregation network(LTFA-Net)for STVSR.Specifically,we design a long-term mixture of experts(LTMoE)module for feature interpolation.LTMoE contains multiple experts to extract mutual and complementary spatial-temporal information from multiple consecutive adjacent frame features,which are then combined with different weights to obtain interpolation results using several gating nets.Next,we perform local and global feature refinement using the Locally-temporal Feature Comparison(LFC)module and bidirectional deformable ConvLSTM layer,respectively.Experimental results on two standard benchmarks,Adobe240 and GoPro,indicate the effectiveness and superiority of our approach over state of the art. 展开更多
关键词 Space-time video super-resolution Mixture of experts Deformable convolutional layer Long-term temporal feature aggregation
原文传递
Specificity-preserving RGB-D saliency detection 被引量:1
6
作者 Tao Zhou Deng-Ping Fan +2 位作者 Geng Chen Yi Zhou Huazhu Fu 《Computational Visual Media》 SCIE EI CSCD 2023年第2期297-317,共21页
Salient object detection(SOD)in RGB and depth images has attracted increasing research interest.Existing RGB-D SOD models usually adopt fusion strategies to learn a shared representation from RGB and depth modalities,... Salient object detection(SOD)in RGB and depth images has attracted increasing research interest.Existing RGB-D SOD models usually adopt fusion strategies to learn a shared representation from RGB and depth modalities,while few methods explicitly consider how to preserve modality-specific characteristics.In this study,we propose a novel framework,the specificity-preserving network(SPNet),which improves SOD performance by exploring both the shared information and modality-specific properties.Specifically,we use two modality-specific networks and a shared learning network to generate individual and shared saliency prediction maps.To effectively fuse cross-modal features in the shared learning network,we propose a cross-enhanced integration module(CIM)and propagate the fused feature to the next layer to integrate cross-level information.Moreover,to capture rich complementary multi-modal information to boost SOD performance,we use a multi-modal feature aggregation(MFA)module to integrate the modalityspecific features from each individual decoder into the shared decoder.By using skip connections between encoder and decoder layers,hierarchical features can be fully combined.Extensive experiments demonstrate that our SPNet outperforms cutting-edge approaches on six popular RGB-D SOD and three camouflaged object detection benchmarks.The project is publicly available at https://github.com/taozh2017/SPNet. 展开更多
关键词 salient object detection(SOD) RGB-D cross-enhanced integration module(CIM) multi-modal feature aggregation(MFA)
原文传递
Automatic modelling of urban subsurface with ground-penetrating radar using multi-agent classification method 被引量:2
7
作者 Tess Xianghuan Luo Pengpeng Yuan Song Zhu 《Geo-Spatial Information Science》 SCIE EI CSCD 2022年第4期588-599,共12页
The subsurface of urban cities is becoming increasingly congested.In-time records of subsur-face structures are of vital importance for the maintenance and management of urban infrastructure beneath or above the groun... The subsurface of urban cities is becoming increasingly congested.In-time records of subsur-face structures are of vital importance for the maintenance and management of urban infrastructure beneath or above the ground.Ground-penetrating radar(GPR)is a nondestructive testing method that can survey and image the subsurface without excava-tion.However,the interpretation of GPR relies on the operator’s experience.An automatic workflow was proposed for recognizing and classifying subsurface structures with GPR using computer vision and machine learning techniques.The workflow comprises three stages:first,full-cover GPR measurements are processed to form the C-scans;second,the abnormal areas are extracted from the full-cover C-scans with coefficient of variation-active contour model(CV-ACM);finally,the extracted segments are recognized and classified from the corresponding B-scans with aggregate channel feature(ACF)to produce a semantic map.The selected computer vision methods were validated by a controlled test in the laboratory,and the entire workflow was evaluated with a real,on-site case study.The results of the controlled and on-site case were both promising.This study establishes the necessity of a full-cover 3D GPR survey,illustrating the feasibility of integrating advanced computer vision techniques to analyze a large amount of 3D GPR survey data,and paves the way for automating subsurface modeling with GPR. 展开更多
关键词 Subsurface modeling ground-penetrating radar computer vision active contour model aggregate channel feature
原文传递
Deeper Attention-Based Network for Structured Data
8
作者 Xiaohua Wu Youping Fan +2 位作者 Wanwan Peng Hong Pang Yu Luo 《国际计算机前沿大会会议论文集》 2020年第1期259-267,共9页
Deep learning methods are applied into structured data and in typical methods,low-order features are discarded after combining with high-order featuresfor prediction tasks.However,in structured data,ignorance of low-o... Deep learning methods are applied into structured data and in typical methods,low-order features are discarded after combining with high-order featuresfor prediction tasks.However,in structured data,ignorance of low-order features may cause the low prediction rate.To address this issue,in this paper,deeper attention-based network(DAN)is proposed.With DAN method,to keep both low-and high-order features,attention average pooling layer was utilized to aggregate features of each order.Furthermore,by shortcut connections from each layer to attention average pooling layer,DAN can be built extremely deep to obtain enough capacity.Experimental results show DAN has good performance and works effectively. 展开更多
关键词 Structured data DeepLearning feature aggregation
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部