摘要
近年来,基于视频的人脸识别吸引了很多人的关注,同时,视觉词袋(BoWs)模型已成功地应用在图像检索和对象识别中。提出了一种基于视频的人脸识别的方法,它利用了视觉单词,在经典的视觉单词中,第一次在兴趣点提取尺度不变特征变换(SIFT)的图像描述;这些兴趣点由高斯差分(DoG)检测,然后基于k均值的视觉词汇生成,使用视觉单词的索引以取代这些描述符。然而,在人脸图像中,由于面部姿势失真,面部表情和光照条件变化,SIFT描述符不是很好。因此,使用仿射SIFT(ASIFT)描述符作为人脸图像表示法。在Yale及ORL人脸数据库上的实验结果表明,在人脸识别中,基于仿射SIFT描述符的视觉单词方法可以获得较低的错误率。
Abstract Recent years, face recognition based on video has been concerned by more and more persons. At the same time, Bag-of-visual Words (BoWs) representation has been successfully applied in image retrieval and object recognition recently. In this paper, a video-based face recognition approach which uses visual words is proposed. In classic visual words, Scale Invariant Feature Transform (SIFT) descriptors of an image are firstly extracted on interest points detected by difference of Gaussian (DoG), then k-means-based visual vocabulary generation is applied to replace these descriptors with the indexes of the closet visual words. However, in facial images, SIFT descriptors are not good enough due to facial pose distortion, facial expression and lighting condition variation. In this paper, we use Affine-SIFT (ASIFT) descriptors as facial image representation. Experimental results on Yale and ORL Database suggest that visual words based on Affine-SIFT descriptors can achieve lower error rates in face recognition task.
出处
《科学技术与工程》
北大核心
2013年第20期5988-5992,共5页
Science Technology and Engineering