期刊文献+
共找到90篇文章
< 1 2 5 >
每页显示 20 50 100
Light field imaging for computer vision:a survey
1
作者 Chen JIA Fan SHI +1 位作者 Meng ZHAO Shengyong CHEN 《Frontiers of Information Technology & Electronic Engineering》 SCIE EI CSCD 2022年第7期1077-1097,共21页
Light field(LF)imaging has attracted attention because of its ability to solve computer vision problems.In this paper we briefly review the research progress in computer vision in recent years.For most factors that af... Light field(LF)imaging has attracted attention because of its ability to solve computer vision problems.In this paper we briefly review the research progress in computer vision in recent years.For most factors that affect computer vision development,the richness and accuracy of visual information acquisition are decisive.LF imaging technology has made great contributions to computer vision because it uses cameras or microlens arrays to record the position and direction information of light rays,acquiring complete three-dimensional(3D)scene information.LF imaging technology improves the accuracy of depth estimation,image segmentation,blending,fusion,and 3D reconstruction.LF has also been innovatively applied to iris and face recognition,identification of materials and fake pedestrians,acquisition of epipolar plane images,shape recovery,and LF microscopy.Here,we further summarize the existing problems and the development trends of LF imaging in computer vision,including the establishment and evaluation of the LF dataset,applications under high dynamic range(HDR)conditions,LF image enhancement,virtual reality,3D display,and 3D movies,military optical camouflage technology,image recognition at micro-scale,image processing method based on HDR,and the optimal relationship between spatial resolution and four-dimensional(4D)LF information acquisition.LF imaging has achieved great success in various studies.Over the past 25 years,more than 180 publications have reported the capability of LF imaging in solving computer vision problems.We summarize these reports to make it easier for researchers to search the detailed methods for specific solutions. 展开更多
关键词 Light field imaging Camera array Microlens array Epipolar plane image Computer vision
原文传递
Computer-aided texture analysis combined with experts' knowledge: Improving endoscopic celiac disease diagnosis 被引量:1
2
作者 Michael Gadermayr Hubert Kogler +3 位作者 Maximilian Karla Dorit Merhof Andreas Uhl Andreas Vécsei 《World Journal of Gastroenterology》 SCIE CAS 2016年第31期7124-7134,共11页
AIM: To further improve the endoscopic detection of intestinal mucosa alterations due to celiac disease(CD).METHODS: We assessed a hybrid approach based on the integration of expert knowledge into the computerbased cl... AIM: To further improve the endoscopic detection of intestinal mucosa alterations due to celiac disease(CD).METHODS: We assessed a hybrid approach based on the integration of expert knowledge into the computerbased classification pipeline. A total of 2835 endoscopic images from the duodenum were recorded in 290 children using the modified immersion technique(MIT). These children underwent routine upper endoscopy for suspected CD or non-celiac upper abdominal symptoms between August 2008 and December 2014. Blinded to the clinical data and biopsy results, three medical experts visually classified each image as normal mucosa(Marsh-0) or villous atrophy(Marsh-3). The experts' decisions were further integrated into state-of-the-arttexture recognition systems. Using the biopsy results as the reference standard, the classification accuracies of this hybrid approach were compared to the experts' diagnoses in 27 different settings.RESULTS: Compared to the experts' diagnoses, in 24 of 27 classification settings(consisting of three imaging modalities, three endoscopists and three classification approaches), the best overall classification accuracies were obtained with the new hybrid approach. In 17 of 24 classification settings, the improvements achieved with the hybrid approach were statistically significant(P < 0.05). Using the hybrid approach classification accuracies between 94% and 100% were obtained. Whereas the improvements are only moderate in the case of the most experienced expert, the results of the less experienced expert could be improved significantly in 17 out of 18 classification settings. Furthermore, the lowest classification accuracy, based on the combination of one database and one specific expert, could be improved from 80% to 95%(P < 0.001).CONCLUSION: The overall classification performance of medical experts, especially less experienced experts, can be boosted significantly by integrating expert knowledge into computer-aided diagnosis systems. 展开更多
关键词 CELIAC disease DIAGNOSIS ENDOSCOPY COMPUTER-AIDED texture analysis BIOPSY Pattern recognition
下载PDF
Vision Transformers with Hierarchical Attention 被引量:1
3
作者 Yun Liu Yu-Huan Wu +3 位作者 Guolei Sun Le Zhang Ajad Chhatkuli Luc Van Gool 《Machine Intelligence Research》 EI CSCD 2024年第4期670-683,共14页
This paper tackles the high computational/space complexity associated with multi-head self-attention(MHSA)in vanilla vision transformers.To this end,we propose hierarchical MHSA(H-MHSA),a novel approach that computes ... This paper tackles the high computational/space complexity associated with multi-head self-attention(MHSA)in vanilla vision transformers.To this end,we propose hierarchical MHSA(H-MHSA),a novel approach that computes self-attention in a hierarchical fashion.Specifically,we first divide the input image into patches as commonly done,and each patch is viewed as a token.Then,the proposed H-MHSA learns token relationships within local patches,serving as local relationship modeling.Then,the small patches are merged into larger ones,and H-MHSA models the global dependencies for the small number of the merged tokens.At last,the local and global attentive features are aggregated to obtain features with powerful representation capacity.Since we only calculate attention for a limited number of tokens at each step,the computational load is reduced dramatically.Hence,H-MHSA can efficiently model global relationships among tokens without sacrificing fine-grained information.With the H-MHSA module incorporated,we build a family of hierarchical-attention-based transformer networks,namely HAT-Net.To demonstrate the superiority of HAT-Net in scene understanding,we conduct extensive experiments on fundamental vision tasks,including image classification,semantic segmentation,object detection and instance segmentation.Therefore,HAT-Net provides a new perspective for vision transformers.Code and pretrained models are available at https://github.com/yun-liu/HAT-Net. 展开更多
关键词 Vision transformer hierarchical attention global attention local attention scene understanding.
原文传递
On‐device audio‐visual multi‐person wake word spotting
4
作者 Yidi Li Guoquan Wang +2 位作者 Zhan Chen Hao Tang Hong Liu 《CAAI Transactions on Intelligence Technology》 SCIE EI 2023年第4期1578-1589,共12页
Audio‐visual wake word spotting is a challenging multi‐modal task that exploits visual information of lip motion patterns to supplement acoustic speech to improve overall detection performance.However,most audio‐vi... Audio‐visual wake word spotting is a challenging multi‐modal task that exploits visual information of lip motion patterns to supplement acoustic speech to improve overall detection performance.However,most audio‐visual wake word spotting models are only suitable for simple single‐speaker scenarios and require high computational complexity.Further development is hindered by complex multi‐person scenarios and computational limitations in mobile environments.In this paper,a novel audio‐visual model is proposed for on‐device multi‐person wake word spotting.Firstly,an attention‐based audio‐visual voice activity detection module is presented,which generates an attention score matrix of audio and visual representations to derive active speaker representation.Secondly,the knowledge distillation method is introduced to transfer knowledge from the large model to the on‐device model to control the size of our model.Moreover,a new audio‐visual dataset,PKU‐KWS,is collected for sentence‐level multi‐person wake word spotting.Experimental results on the PKU‐KWS dataset show that this approach outperforms the previous state‐of‐the‐art methods. 展开更多
关键词 audio‐visual fusion human‐computer interfacing speech processing
下载PDF
Dual Branch PnP Based Network for Monocular 6D Pose Estimation
5
作者 Jia-Yu Liang Hong-Bo Zhang +2 位作者 Qing Lei Ji-Xiang Du Tian-Liang Lin 《Intelligent Automation & Soft Computing》 SCIE 2023年第6期3243-3256,共14页
Monocular 6D pose estimation is a functional task in the field of com-puter vision and robotics.In recent years,2D-3D correspondence-based methods have achieved improved performance in multiview and depth data-based s... Monocular 6D pose estimation is a functional task in the field of com-puter vision and robotics.In recent years,2D-3D correspondence-based methods have achieved improved performance in multiview and depth data-based scenes.However,for monocular 6D pose estimation,these methods are affected by the prediction results of the 2D-3D correspondences and the robustness of the per-spective-n-point(PnP)algorithm.There is still a difference in the distance from the expected estimation effect.To obtain a more effective feature representation result,edge enhancement is proposed to increase the shape information of the object by analyzing the influence of inaccurate 2D-3D matching on 6D pose regression and comparing the effectiveness of the intermediate representation.Furthermore,although the transformation matrix is composed of rotation and translation matrices from 3D model points to 2D pixel points,the two variables are essentially different and the same network cannot be used for both variables in the regression process.Therefore,to improve the effectiveness of the PnP algo-rithm,this paper designs a dual-branch PnP network to predict rotation and trans-lation information.Finally,the proposed method is verified on the public LM,LM-O and YCB-Video datasets.The ADD(S)values of the proposed method are 94.2 and 62.84 on the LM and LM-O datasets,respectively.The AUC of ADD(-S)value on YCB-Video is 81.1.These experimental results show that the performance of the proposed method is superior to that of similar methods. 展开更多
关键词 6D pose monocular RGB edge enhancement dual-branch PnP 2D-3D correspondence
下载PDF
TextFormer: A Query-based End-to-end Text Spotter with Mixed Supervision
6
作者 Yukun Zhai Xiaoqiang Zhang +3 位作者 Xiameng Qin Sanyuan Zhao Xingping Dong Jianbing Shen 《Machine Intelligence Research》 EI CSCD 2024年第4期704-717,共14页
End-to-end text spotting is a vital computer vision task that aims to integrate scene text detection and recognition into a unified framework.Typical methods heavily rely on region-of-interest(Rol)operations to extrac... End-to-end text spotting is a vital computer vision task that aims to integrate scene text detection and recognition into a unified framework.Typical methods heavily rely on region-of-interest(Rol)operations to extract local features and complex post-processing steps to produce final predictions.To address these limitations,we propose TextFormer,a query-based end-to-end text spotter with a transformer architecture.Specifically,using query embedding per text instance,TextFormer builds upon an image encoder and a text decoder to learn a joint semantic understanding for multitask modeling.It allows for mutual training and optimization of classification,segmentation and recognition branches,resulting in deeper feature sharing without sacrificing flexibility or simplicity.Additionally,we design an adaptive global aggregation(AGG)module to transfer global features into sequential features for reading arbitrarilyshaped texts,which overcomes the suboptimization problem of Rol operations.Furthermore,potential corpus information is utilized from weak annotations to full labels through mixed supervision,further improving text detection and end-to-end text spotting results.Extensive experiments on various bilingual(i.e.,English and Chinese)benchmarks demonstrate the superiority of our method.Especially on the TDA-ReCTS dataset,TextFormer surpasses the state-of-the-art method in terms of 1-NED by 13.2%. 展开更多
关键词 End-to-end text spotting arbitrarily-shaped texts transformer mixed supervision multitask modeling.
原文传递
Polyp-PVT:Polyp Segmentation with Pyramid Vision Transformers 被引量:1
7
作者 Bo Dong Wenhai Wang +3 位作者 Deng-Ping Fan Jinpeng Li Huazhu Fu Ling Shao 《CAAI Artificial Intelligence Research》 2023年第1期1-15,共15页
Most polyp segmentation methods use convolutional neural networks(CNNs)as their backbone,leading to two key issues when exchanging information between the encoder and decoder:(1)taking into account the differences in ... Most polyp segmentation methods use convolutional neural networks(CNNs)as their backbone,leading to two key issues when exchanging information between the encoder and decoder:(1)taking into account the differences in contribution between different-level features,and(2)designing an effective mechanism for fusing these features.Unlike existing CNN-based methods,we adopt a transformer encoder,which learns more powerful and robust representations.In addition,considering the image acquisition influence and elusive properties of polyps,we introduce three standard modules,including a cascaded fusion module(CFM),a camouflage identification module(CIM),and a similarity aggregation module(SAM).Among these,the CFM is used to collect the semantic and location information of polyps from high-level features;the CIM is applied to capture polyp information disguised in low-level features,and the SAM extends the pixel features of the polyp area with high-level semantic position information to the entire polyp area,thereby effectively fusing cross-level features.The proposed model,named Polyp-PVT,effectively suppresses noises in the features and significantly improves their expressive capabilities.Extensive experiments on five widely adopted datasets show that the proposed model is more robust to various challenging situations(e.g.,appearance changes,small objects,and rotation)than existing representative methods.The proposed model is available at https://github.com/DengPingFan/Polyp-PVT. 展开更多
关键词 polyp segmentation pyramid vision transformer COLONOSCOPY computer vision
原文传递
Simulation of thermal field induced by concave spherical transducer in multi-layer media 被引量:5
8
作者 丁亚军 钱盛友 廖志远 《Journal of Central South University》 SCIE EI CAS 2013年第11期3166-3170,共5页
High intensity focused ultrasound(HIFU)therapy is an effective method in clinical treatment of tumors,in order to explore the bio-heat conduction mechanism of in multi-layer media by concave spherical transducer,tempe... High intensity focused ultrasound(HIFU)therapy is an effective method in clinical treatment of tumors,in order to explore the bio-heat conduction mechanism of in multi-layer media by concave spherical transducer,temperature field induced by this kind of transducer in multi-layer media will be simulated through solving Pennes equation with finite difference method,and the influence of initial sound pressure,absorption coefficient,and thickness of different layers of biological tissue as well as thermal conductivity parameter on sound focus and temperature distribution will be analyzed,respectively.The results show that the temperature in focus area increases faster while the initial sound pressure and thermal conductivity increase.The absorption coefficient is smaller,the ultrasound intensity in the focus area is bigger,and the size of the focus area is increasing.When the thicknesses of different layers of tissue change,the focus position changes slightly,but the sound intensity of the focus area will change obviously.The temperature in focus area will rise quickly before reaching a threshold,and then the temperature will keep in the threshold range. 展开更多
关键词 multi-layer media concave spherical transducer high intensity focused ultrasound thermal field
下载PDF
Automated lfare forecasting using a statistical learning technique 被引量:10
9
作者 Yuan Yuan Frank Y. Shih +1 位作者 Ju Jing Hai-Min Wang 《Research in Astronomy and Astrophysics》 SCIE CAS CSCD 2010年第8期785-796,共12页
We present a new method for automatically forecasting the occurrence of solar flares based on photospheric magnetic measurements. The method is a cascading combination of an ordinal logistic regression model and a sup... We present a new method for automatically forecasting the occurrence of solar flares based on photospheric magnetic measurements. The method is a cascading combination of an ordinal logistic regression model and a support vector machine classifier. The predictive variables are three photospheric magnetic parameters, i.e., the total unsigned magnetic flux, length of the strong-gradient magnetic polarity inversion line, and total magnetic energy dissipation. The output is true or false for the occurrence of a certain level of flares within 24 hours. Experimental results, from a sample of 230 active regions between 1996 and 2005, show the accuracies of a 24- hour flare forecast to be 0.86, 0.72, 0.65 and 0.84 respectively for the four different levels. Comparison shows an improvement in the accuracy of X-class flare forecasting. 展开更多
关键词 SUN flares -- Sun magnetic fields
下载PDF
Improved pedestrian detection with peer AdaBoost cascade 被引量:4
10
作者 FU Hong-pu ZOU Bei-ji +3 位作者 ZHU Cheng-zhang DAI Yu-lan JIANG Ling-zi CHANG Zhe 《Journal of Central South University》 SCIE EI CAS CSCD 2020年第8期2269-2279,共11页
Focusing on data imbalance and intraclass variation,an improved pedestrian detection with a cascade of complex peer AdaBoost classifiers is proposed.The series of the AdaBoost classifiers are learned greedily,along wi... Focusing on data imbalance and intraclass variation,an improved pedestrian detection with a cascade of complex peer AdaBoost classifiers is proposed.The series of the AdaBoost classifiers are learned greedily,along with negative example mining.The complexity of classifiers in the cascade is not limited,so more negative examples are used for training.Furthermore,the cascade becomes an ensemble of strong peer classifiers,which treats intraclass variation.To locally train the AdaBoost classifiers with a high detection rate,a refining strategy is used to discard the hardest negative training examples rather than decreasing their thresholds.Using the aggregate channel feature(ACF),the method achieves miss rates of 35%and 14%on the Caltech pedestrian benchmark and Inria pedestrian dataset,respectively,which are lower than that of increasingly complex AdaBoost classifiers,i.e.,44%and 17%,respectively.Using deep features extracted by the region proposal network(RPN),the method achieves a miss rate of 10.06%on the Caltech pedestrian benchmark,which is also lower than 10.53%from the increasingly complex cascade.This study shows that the proposed method can use more negative examples to train the pedestrian detector.It outperforms the existing cascade of increasingly complex classifiers. 展开更多
关键词 peer classifier hard negative refining pedestrian detection CASCADE
下载PDF
Evolution of Cooperation in Public Goods Games 被引量:1
11
作者 夏承遗 张娟娟 +1 位作者 王祎玲 王劲松 《Communications in Theoretical Physics》 SCIE CAS CSCD 2011年第10期638-644,共7页
We investigate the evolution of cooperation with evolutionary public goods games based on finite populations, where four pure strategies: cooperators, defectors, punishers and loners who are unwilling to participate ... We investigate the evolution of cooperation with evolutionary public goods games based on finite populations, where four pure strategies: cooperators, defectors, punishers and loners who are unwilling to participate are considered. By adopting approximate best response dynamics, we show that the magnitude of rationality not only quantitatively explains the experiment results in [Nature (London) 425 (2003) 390], but also it will heavily influence the evolution of cooperation. Compared with previous results of infinite populations, which result in two equilibriums, we show that there merely exists a special equilibrium cooperation. In addition, we characterize that loner's and the relevant high value of bounded rationality will sustain payoff plays an active role in the maintenance of cooperation, which will only be warranted for the low and moderate values of loner's payoff. It thus indicates the effects of rationality and loner's payoff will influence the cooperation. Finally, we highlight the important result that the introduction of voluntary participation and punishment will facilitate cooperation greatly. 展开更多
关键词 public goods games magnitude of rationality voluntary participation PUNISHMENT
下载PDF
Wavelength dependent loss of splice of single-mode fibers 被引量:1
12
作者 YANG Bo DUAN Ji-an +1 位作者 XIE Zheng XIAO Hong-feng 《Journal of Central South University》 SCIE EI CAS 2013年第7期1832-1837,共6页
After reviewing three different definitions of mode field diameter of single-mode fibers, coupled efficiency calculation methods associated with lateral offset, longitude separation and wavelength, the effects produce... After reviewing three different definitions of mode field diameter of single-mode fibers, coupled efficiency calculation methods associated with lateral offset, longitude separation and wavelength, the effects produced by them, and the influences of splicing defects were discussed in detail. The regularities of the effects were studied according to the first order derivation of couple efficiency formula, and a simplified formula for couple efficiency calculation was presented under the circumstance of slight misalignment, with respect to wavelength, 2, and in a good agreement with the theoretical model. The simplified formula provides a new but simple approach to evaluate wavelength dependent couple efficiency of single-mode fibers. Theoretical analyses and numerical calculations show that, when those defects exist, the wavelength produces additional effects on the couple loss that growth of wavelength causes an increase on the couple efficiency for the lateral offset or longitude separation whereas lessens the couple efficiency due to angular misalignment or mode fields mismatching, and that the wavelength degrades the couple efficiency distinctly when λ≥2.5 μm whereas it distorts the couple slightly in range of λ≤2λ≤2 μm. 展开更多
关键词 wavelength dependent loss couple efficiency single-mode fiber cut-offwavelength
下载PDF
Multi-gradient-direction based deep learning model for arecanut disease identification 被引量:2
13
作者 S.B.Mallikarjuna Palaiahnakote Shivakumara +3 位作者 Vijeta Khare M.Basavanna Umapada Pal B.Poornima 《CAAI Transactions on Intelligence Technology》 SCIE EI 2022年第2期156-166,共11页
Arecanut disease identification is a challenging problem in the field of image processing.In this work,we present a new combination of multi-gradient-direction and deep con-volutional neural networks for arecanut dise... Arecanut disease identification is a challenging problem in the field of image processing.In this work,we present a new combination of multi-gradient-direction and deep con-volutional neural networks for arecanut disease identification,namely,rot,split and rot-split.Due to the effect of the disease,there are chances of losing vital details in the images.To enhance the fine details in the images affected by diseases,we explore multi-Sobel directional masks for convolving with the input image,which results in enhanced images.The proposed method extracts arecanut as foreground from the enhanced images using Otsu thresholding.Further,the features are extracted for foreground information for disease identification by exploring the ResNet architecture.The advantage of the proposed approach is that it identifies the diseased images from the healthy arecanut images.Experimental results on the dataset of four classes(healthy,rot,split and rot-split)show that the proposed model is superior in terms of classification rate. 展开更多
关键词 deep learning image analysis pattern recognition
下载PDF
Action Recognition with Temporal Scale-Invariant Deep Learning Framework 被引量:1
14
作者 Huafeng Chen Jun Chen +2 位作者 Ruimin Hu Chen Chen Zhongyuan Wang 《China Communications》 SCIE CSCD 2017年第2期163-172,共10页
Recognizing actions according to video features is an important problem in a wide scope of applications. In this paper, we propose a temporal scale.invariant deep learning framework for action recognition, which is ro... Recognizing actions according to video features is an important problem in a wide scope of applications. In this paper, we propose a temporal scale.invariant deep learning framework for action recognition, which is robust to the change of action speed. Specifically, a video is firstly split into several sub.action clips and a keyframe is selected from each sub.action clip. The spatial and motion features of the keyframe are extracted separately by two Convolutional Neural Networks(CNN) and combined in the convolutional fusion layer for learning the relationship between the features. Then, Long Short Term Memory(LSTM) networks are applied to the fused features to formulate long.term temporal clues. Finally, the action prediction scores of the LSTM network are combined by linear weighted summation. Extensive experiments are conducted on two popular and challenging benchmarks, namely, the UCF.101 and the HMDB51 Human Actions. On both benchmarks, our framework achieves superior results over the state.of.the.art methods by 93.7% on UCF.101 and 69.5% on HMDB51, respectively. 展开更多
关键词 action recognition CNN LSTM
下载PDF
CNN-RNN based method for license plate recognition 被引量:5
15
作者 Palaiahnakote Shivakumara Dongqi Tang +3 位作者 Maryam Asadzadehkaljahi Tong Lu Umapada Pal Mohammad Hossein Anisi 《CAAI Transactions on Intelligence Technology》 2018年第3期169-175,共7页
Achieving good recognition results for License plates is challenging due to multiple adverse factors. For instance, in Malaysia, where private vehicle (e.g., cars) have numbers with dark background, while public veh... Achieving good recognition results for License plates is challenging due to multiple adverse factors. For instance, in Malaysia, where private vehicle (e.g., cars) have numbers with dark background, while public vehicle (taxis/cabs) have numbers with white background. To reduce the complexity of the problem, we propose to classify the above two types of images such that one can choose an appropriate method to achieve better results. Therefore, in this work, we explore the combination of Convolutional Neural Networks (CNN) and Recurrent Neural Networks namely, BLSTM (Bi-Directional Long Short Term Memory), for recognition. The CNN has been used for feature extraction as it has high discriminative ability, at the same time, BLSTM has the ability to extract context information based on the past information. For classification, we propose Dense Cluster based Voting (DCV), which separates foreground and background for successful classification of private and public. Experimental results on live data given by MIMOS, which is funded by Malaysian Government and the standard dataset UCSD show that the proposed classification outperforms the existing methods. In addition, the recognition results show that the recognition performance improves significantly after classification compared to before classification. 展开更多
关键词 车牌识别 识别率 发展现状 人工智能
下载PDF
On Constructing Approximate Convex Hull 被引量:1
16
作者 M. Zahid Hossain M. Ashraful Amin 《American Journal of Computational Mathematics》 2013年第1期11-17,共7页
The algorithms of convex hull have been extensively studied in literature, principally because of their wide range of applications in different areas. This article presents an efficient algorithm to construct approxim... The algorithms of convex hull have been extensively studied in literature, principally because of their wide range of applications in different areas. This article presents an efficient algorithm to construct approximate convex hull from a set of n points in the plane in O(n+k) time, where k is the approximation error control parameter. The proposed algorithm is suitable for applications preferred to reduce the computation time in exchange of accuracy level such as animation and interaction in computer graphics where rapid and real-time graphics rendering is indispensable. 展开更多
关键词 CONVEX HULL APPROXIMATION Algorithm COMPUTATIONAL Geometry Linear Time
下载PDF
Detection of Contamination Defect on Ice Cream Bar Based on Fuzzy Rule and Absolute Neighborhood 被引量:2
17
作者 LI Shaoli YUAN Weiqi 《Instrumentation》 2017年第3期24-34,共11页
The contamination proposed in this paper is a defect on the surface of ice cream bar,which is a serious security threat.So it is essential to detect this defect before launched on the market. A detection method of con... The contamination proposed in this paper is a defect on the surface of ice cream bar,which is a serious security threat.So it is essential to detect this defect before launched on the market. A detection method of contamination defect on the ice cream bar surface is proposed,which is based on fuzzy rule and absolute neighborhood feature. Firstly,the ice cream bar surface is divided into several sub-regions via the defined adjacent gray level clustering method. Then the alternative contamination regions are extracted from the sub-regions via the defined fuzzy rule. At last,the real contamination regions are recognized via the relationship between absolute neighborhood gray feature and default threshold. The algorithm was tested in the self-built image database SUT-D. The results show that the accuracy of the method proposed in this paper is 97.32 percent,which increases 2.68 percent at least comparing to the other typical algorithms. It indicates that the superiority proposed in this paper,which is of actual use value. 展开更多
关键词 Fuzzy Rule Absolute Neighborhood Icecream Bar CONTAMINATION Adjacent Dray Level Clustering
下载PDF
Masked Vision-language Transformer in Fashion 被引量:1
18
作者 Ge-Peng Ji Mingchen Zhuge +3 位作者 Dehong Gao Deng-Ping Fan Christos Sakaridis Luc Van Gool 《Machine Intelligence Research》 EI CSCD 2023年第3期421-434,共14页
We present a masked vision-language transformer(MVLT)for fashion-specific multi-modal representation.Technically,we simply utilize the vision transformer architecture for replacing the bidirectional encoder representa... We present a masked vision-language transformer(MVLT)for fashion-specific multi-modal representation.Technically,we simply utilize the vision transformer architecture for replacing the bidirectional encoder representations from Transformers(BERT)in the pre-training model,making MVLT the first end-to-end framework for the fashion domain.Besides,we designed masked image reconstruction(MIR)for a fine-grained understanding of fashion.MVLT is an extensible and convenient architecture that admits raw multimodal inputs without extra pre-processing models(e.g.,ResNet),implicitly modeling the vision-language alignments.More importantly,MVLT can easily generalize to various matching and generative tasks.Experimental results show obvious improvements in retrieval(rank@5:17%)and recognition(accuracy:3%)tasks over the Fashion-Gen 2018 winner,Kaleido-BERT.The code is available at https://github.com/GewelsJI/MVLT. 展开更多
关键词 Vision-language masked image reconstruction TRANSFORMER FASHION e-commercial
原文传递
Traffic System Reliability Comparison Between Digital Driving and Conventional Driving
19
作者 王武宏 沈中杰 +2 位作者 刘皓 姚丽亚 池内克史 《Journal of Beijing Institute of Technology》 EI CAS 2009年第4期412-415,共4页
Driver behavior modeling is becoming increasingly important in the study of traffic safety and devel- opment of cognitive vehicles. An algorithm for dealing with reliability for both digital driving and conventional d... Driver behavior modeling is becoming increasingly important in the study of traffic safety and devel- opment of cognitive vehicles. An algorithm for dealing with reliability for both digital driving and conventional driving has been developed in this paper. Problems of digital driving error classification, digital driving error probability quantification and digital driving reliability simulation have been addressed using a comparison re- search method. Simulation results show that driving reliability analysis discussed here is capable of identifying digital driving behavior characteristics and achieving safety assessment of intelligent transportation system. 展开更多
关键词 driving behavior digital driving characteristics cognitive vehicle intelligent transportation system
下载PDF
Multimodality image registration and fusion using neural network
20
作者 Mostafa G Mostafa Aly A Farag Edward Essock 《Journal of Harbin Institute of Technology(New Series)》 EI CAS 2003年第3期235-240,共6页
Multimodality image registration and fusion are essential steps in building 3-D models from remotesensing data. We present in this paper a neural network technique for the registration and fusion of multimodali-ty rem... Multimodality image registration and fusion are essential steps in building 3-D models from remotesensing data. We present in this paper a neural network technique for the registration and fusion of multimodali-ty remote sensing data for the reconstruction of 3-D models of terrain regions. A FeedForward neural network isused to fuse the intensity data sets with the spatial data set after learning its geometry. Results on real data arepresented. Human performance evaluation is assessed on several perceptual tests in order to evaluate the fusionresults. 展开更多
关键词 data fusion image registration image interpolation neural network 3-D model building
下载PDF
上一页 1 2 5 下一页 到第
使用帮助 返回顶部