摘要
随着互联网的普及,电子邮件作为一种被人们广泛使用的工具,已被越来越多的不法分子用作广告、反动、色情等信息传播的工具,产生了大量的垃圾邮件。目前主流的垃圾过滤器是基于文本的,垃圾邮件制造者为了逃避被过滤,常常将文字转化为图像或者将文字嵌入到图像中,产生了大量的垃圾图像。针对广告垃圾图像多为文字图像这一特点,根据文字图像中的文字边缘分布特征,提出一种基于边缘特征的广告垃圾图像过滤方法,先检测出图像的纵向边缘,然后根据纵向边缘的分布特征提取文字行区域,最后将文字行区域去噪,确定最终的文字区域。实验证明,该方法效果良好。
With the popularization of the Internet, E-mail, as a widely used tool in people' s life, has been used as the means more and more to spread wicked information, such as advertisement, reactionary and pornographic information, huge quantity of junk mails has produced. At present, the mainstream spam filters are based on text. To evade from these filters, the spam senders often transform the text works into image or embed the text into image, therefore, more and more trash image occur in E-mails. According to the feature that most advertisement trash images are text images and the distribution of text edge in these text images, in this paper it proposes an improved advertisement trash image filtrate method based on edge feature detection. Firstly we detect vertical edges of the image, and then extract the text lines area based on the distribution character of the vertical edges, at last denoise the text lines to finalize the pure text area. The result of the experiment shows that the method is effective.
出处
《计算机应用与软件》
CSCD
北大核心
2008年第10期49-51,59,共4页
Computer Applications and Software
基金
国家"八六三"高科技研究发展计划资助(2006AA01Z196)
关键词
广告垃圾图像
边缘检测
文字区域
Advertisement spam image Edge detection Text area