home forums resources search newsjoinmembers: 6370
Hiveminds Network PHP Flash Java Ruby Windows Linux
friendly: printshare: delicious, digg, reddit, icerocket

Description

This is a re-write of Jeremy Andrews earlier spam module efforts, released under a modified BSD license. (At this time, the Drupal project page does not support the inclusion of BSD modules, hence the new module is hosted here. For more information on including BSD modules with Drupal, read this message.)

The Bayesian filter does statistical analysis on spam content, learning from spam and non-spam that it sees to determine the liklihood that new content is or is not spam. The filter starts out knowing nothing, and has to be trained every time it makes a mistake. This is done by marking spam content on your site as spam when you see it. Each word of the spam content will be remembered and assigned a probability. The more often a word shows up in spam content, the higher the probability that future content with the same word is also spam. As most comment spam contains links back to the spammer's websites (ie to sell Prozac), the Bayesian filter provides a special option to quickly learn and block content that contains links to known spammer websites.

The custom filtering functionality can blacklist, whitelist or greylist based on the matching of words, phrases and regular expressions. For example, a custom filter can be defined to always mark content as spam if it contains the word 'Viagra'. Or, a custom filter can be defined to increase the probability that content is spam if it matches the case insensitive regular expression /free/i.

The spam module can also limit the total number of URLs allowed in comments and other content, as well as the number of times the same URL can be repeated in the same content. These limits can be different for comments and for other types of content. For example, if the module is set to only allow the same exact URL to appear in a comment twice, if "http://kerneltrap.org/" shows up in the same comment three or more times, the comment will be considered spam.

The fourth tool for detecting spam is to look up the poster's IP address in the Distributed Server Boycott List (http://dsbl.org/). If the address is listed, it is known to come from an untrusted email server such as an open relay and is marked as spam. The theory is that most comment-spammers are also email spammers.

As an Drupal administrator, you can decide to enable any or all of the above tools as best suited to your needs.

Features

  • Written in PHP specifically for Drupal.
  • Highly configurable.
  • Automatically detects and unpublishes spam comments and other spam content.
  • Automatically learns to detect spam in any language using Bayesian logic.
  • Automatically learns and blocks spammer URLs.
  • Automatically blacklists IPs of learned spammers, preventing them from posting additional spam and wasting database resources.
  • Detects repeated postings of the same identical content.
  • Detects content containing too many links, or the same link over and over.
  • Supports the creation of custom filters using powerful regular expressions.
  • Can notify the user that his or her content was determined to be spam, preventing confusion over why their content doesn't show up.
  • Can notify the site administrator in an email when spam is detected.
  • Provides simple administrative interfaces for reviewing spam content.
  • Provides comprehensive logging to offer an understanding as to how and why content is determined to be or not to be spam.

Add-ons

  • Spam SURBL, supports six Spam URI Realtime Blocklists
  • Trackback Blackhole, to block trackback spammers on sites that don't use the trackback module.
File: login or register to download  Author:Jeremy Andrews

Hiveminds's picture
This article brought to you by the Hiveminds Magazine - Staff. Contact us if you want to post an article or announcement anonymously
friendly: printshare: delicious, digg, reddit, icerocket
 
Bitrix Site manager - fast to create, easy to manage CMS Comparison Matrix
Put Your Site Here Developer Links
Drupal eRuby PHP Content Management Systems Content Management Systems Drupal
 

Newsletter

Get updates on Hiveminds services, articles and downloads by signing up for the newsletter.

Editor's choice

Some of the better articles, stories and tutorials found at Hiveminds.

Find more

Find more of Hiveminds articles, stories, tutorials and user comments by searching.




Picked links

Hand picked websites and articles from around the web that provide quality reading.