Web spamming是指故意误导搜索引擎的行为,它使得一些页面的排序值比它的应有值更高。最近几年,随着webspam的急剧增加,使得搜索引擎的搜索结果也降低了一些等级。文章首先讨论了Spam的基本概念和影响,然后详细地分析了当前的各种Spamm...Web spamming是指故意误导搜索引擎的行为,它使得一些页面的排序值比它的应有值更高。最近几年,随着webspam的急剧增加,使得搜索引擎的搜索结果也降低了一些等级。文章首先讨论了Spam的基本概念和影响,然后详细地分析了当前的各种Spamming技术,包括termspaming、link spamming和隐藏技术三种类型。我们相信本文的分析对于开发恰当的反措施是非常有用的。展开更多
In the global information era,people acquire more and more information from the Internet,but the quality of the search results is degraded strongly because of the presence of web spam.Web spam is one of the serious pr...In the global information era,people acquire more and more information from the Internet,but the quality of the search results is degraded strongly because of the presence of web spam.Web spam is one of the serious problems for search engines,and many methods have been proposed for spam detection.We exploit the content features of non-spam in contrast to those of spam.The content features for non-spam pages always possess lots of statistical regularities; but those for spam pages possess very few statistical regularities,because spam pages are made randomly in order to increase the page rank.In this paper,we summarize the regularities distributions of content features for non-spam pages,and propose the calculating probability formulae of the entropy and independent n-grams respectively.Furthermore,we put forward the calculation formulae of multi features correlation.Among them,the notable content features may be used as auxiliary information for spam detection.展开更多
Most of the spam filtering techniques are based on objective methods such as the content filtering and DNS/reverse DNS checks. Recently, some cooperative subjective spam filtering techniques are proposed. Objective me...Most of the spam filtering techniques are based on objective methods such as the content filtering and DNS/reverse DNS checks. Recently, some cooperative subjective spam filtering techniques are proposed. Objective methods suffer from the false positive and false negative classification. Objective methods based on the content filtering are time consuming and resource demanding. They are inaccurate and require continuous update to cope with newly invented spammer’s tricks. On the other side, the existing subjective proposals have some drawbacks like the attacks from malicious users that make them unreliable and the privacy. In this paper, we propose an efficient spam filtering system that is based on a smart cooperative subjective technique for content filtering in addition to the fastest and the most reliable non-content-based objective methods. The system combines several applications. The first is a web-based system that we have developed based on the proposed technique. A server application having extra features suitable for the enterprises and closed work groups is a second part of the system. Another part is a set of standard web services that allow any existing email server or email client to interact with the system. It allows the email servers to query the system for email filtering. They can also allow the users via the mail user agents to participate in the subjective spam filtering problem.展开更多
基金supported by the National Science Foundation of China(No.61170145,61373081)the Specialized Research Fund for the Doctoral Program of Higher Education of China(No.20113704110001)+1 种基金the Technology and Development Project of Shandong(No.2013GGX10125)the Taishan Scholar Project of Shandong,China
文摘In the global information era,people acquire more and more information from the Internet,but the quality of the search results is degraded strongly because of the presence of web spam.Web spam is one of the serious problems for search engines,and many methods have been proposed for spam detection.We exploit the content features of non-spam in contrast to those of spam.The content features for non-spam pages always possess lots of statistical regularities; but those for spam pages possess very few statistical regularities,because spam pages are made randomly in order to increase the page rank.In this paper,we summarize the regularities distributions of content features for non-spam pages,and propose the calculating probability formulae of the entropy and independent n-grams respectively.Furthermore,we put forward the calculation formulae of multi features correlation.Among them,the notable content features may be used as auxiliary information for spam detection.
文摘Most of the spam filtering techniques are based on objective methods such as the content filtering and DNS/reverse DNS checks. Recently, some cooperative subjective spam filtering techniques are proposed. Objective methods suffer from the false positive and false negative classification. Objective methods based on the content filtering are time consuming and resource demanding. They are inaccurate and require continuous update to cope with newly invented spammer’s tricks. On the other side, the existing subjective proposals have some drawbacks like the attacks from malicious users that make them unreliable and the privacy. In this paper, we propose an efficient spam filtering system that is based on a smart cooperative subjective technique for content filtering in addition to the fastest and the most reliable non-content-based objective methods. The system combines several applications. The first is a web-based system that we have developed based on the proposed technique. A server application having extra features suitable for the enterprises and closed work groups is a second part of the system. Another part is a set of standard web services that allow any existing email server or email client to interact with the system. It allows the email servers to query the system for email filtering. They can also allow the users via the mail user agents to participate in the subjective spam filtering problem.