I've been fighting a lot of spam over at ASCIIBin. In order to minimize it I wrote a class that attempts to detect spam and reject the content before it is submitted. It uses:- Akismet - Checks against Content, URL, Name, Email
- SURBL - All URLs in content are parsed and checked against this spam database. This database contains URLs submitted via spam emails.
- Spamhaus - Checks IP for known spammers
- SpamCop - Checks IP for known spammers
Prerequisites
This class is based on two other classes which you'll need to download and install. - PEAR::Net_DNSBL - Use pear to install (pear install Net_DNSBL)
- PHP5Akismet - Download and extract archive. Rename folder to Akismet
- You will need to obtain a WordPress API key here
The PEAR class uses your resolv.conf file located in /etc/resolv.conf. If you have PHP Open Base Dir restriction you'll need to put a file named '.resolv.conf' in the directory executing this script. If you are in Windows you can create \etc\resolv.conf or place .resolv.conf in the executing directory. resolv.conf contains a list of nameservers which are needed by Net_DNSBL to send TCP/UDP packets.
The Script
Code:
<?php
// {{{ Header
/**
* ASCII Post/Comment Spam Checker
*
* PHP versions 5
*
* LICENSE:
*
* Copyright (c) 2008 CodeCall.net
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted under the terms of the BSD License.
*
* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
* POSSIBILITY OF SUCH DAMAGE.
*
*
* @category Copy and Paste
* @author Jordan (CodeCall.net)
* @date 9-30-2008
* @version 1.0
* @link http://www.codecall.net
* @copyright 2008 CodeCall.net
* @uses PEAR::Net_DNSBL
* @uses Akismet (http://www.achingbrain.net/stuff/php/akismet)
*
*
*/
// }}}
// {{{ Includes
/**
* Include our spam checking third-party
* classes.
*
* Akismet is local
* Net/* is PEAR
*
*/
require('Akismet/Akismet.class.php');
require('Net/DNSBL.php');
require('Net/DNSBL/SURBL.php');
// }}}
// {{{ Class
class SpamChecker {
//{{{ Members
/*
* The actual text of the comment or
* submission data
*/
private $__comment;
/*
* The "used" name of the submitter.
* This value may be blank.
*/
private $__name;
/*
* The email used to submit content, if
* included.
*/
private $__email;
/*
* The URL used to submit, if included
*/
private $__url;
/*
* Word Press API
* Needed for Akismet, can be
* obtained from http://en.wordpress.com/api-keys/
*/
private $__wordPressApiKey;
/*
* The site running the spam test
* You site URL (http://www.you.com)
*/
private $__ownerSiteUrl;
//}}}
// {{{ methods
/**
* Constructor for SpamChecker
*
* @param string $comment
* @param string $name
* @param string $email
* @param string $url
* @param string $wordPressApiKey
* @param string $ownerSiteUrl
* @return SpamChecker
*/
public function SpamChecker($comment, $name="", $email="",
$url="", $wordPressApiKey="", $ownerSiteUrl="" ) {
/*
* Apply our local variables to the class
* members
*/
$this->__comment = $comment;
$this->__name = $name;
$this->__email = $email;
$this->__url = $url;
$this->__wordPressApiKey = $wordPressApiKey;
$this->__ownerSiteUrl = $ownerSiteUrl;
}
/**
* Check for spam using different methods.
* This function is more of a controller
* that executes other, private functions. A
* true or false bool value is returned.
*
* true = detected spam
* false = did not detect spam
*
* @return array
*/
public function isSpam() {
/*
* Create generic array
*/
$spamResults = array();
/*
* Check against Akismet
*/
$spamResults['Akismet'] = $this->checkAkismet();
/*
* Check against IP Black
* Lists
*/
$spamResults['BlackLists'] = $this->checkBlackLists();
/*
* Scan content URLs against previously submitted
* URL spam database
*/
$spamResults['SpamURLs'] = $this->scanContentUrls();
/*
* Set global Spam flag
*/
$spamResults['Spam'] = ($spamResults['Akismet'] || $spamResults['BlackLists'] || $spamResults['SpamURLs']) ? true : false;
/*
* Return array
*/
return $spamResults;
}
/**
* Check the comment for spam against
* Akismet. Akismet is the popular wordpress
* blogging comment spam checker. It works extremely
* well but may not work in all circumstances if all
* data is not provided.
*
* @return bool
*/
private function checkAkismet() {
/*
* Create the class and add
* paramters
*/
$akismet = new Akismet($this->__ownerSiteUrl ,$this->__wordPressApiKey);
$akismet->setCommentAuthor($this->__name);
$akismet->setCommentAuthorEmail($this->__email);
$akismet->setCommentAuthorURL($this->__url);
$akismet->setCommentContent($this->__comment);
//$akismet->setPermalink(‘http://www.example.com/blog/alex/someurl/’);
/*
* Run the test
*/
if($akismet->isCommentSpam()) {
// Found spam
return true;
} else {
return false;
}
}
/**
* Check the IP address against IP
* black lists. If the IP is found in
* the database, the user has already
* been turned in for email or content
* spam by another user.
*
* Uses two well known services:
* spamcop.net
* spamhaus.org
*
* @uses PEAR::Net_DNSBL
*/
private function checkBlackLists() {
/*
* Create class
*/
$dnsbl = new Net_DNSBL();
/*
* Obtain the IP address of the person
* submitting content
*/
$remoteIp = $_SERVER['REMOTE_ADDR'];
/*
* Set the black lists to check
* against
*/
$dnsbl->setBlacklists(array('sbl-xbl.spamhaus.org', 'bl.spamcop.net'));
/*
* Run the Check
*/
if ($dnsbl->isListed($remoteIp)) {
// Found Spam
return true;
}
/*
* Nothing found, return
* false
*/
return false;
}
/**
* Take the content of the submitted
* comment/data and extract all URLs.
* Send each URL to checkUrlForSpam()
* to receive a response.
*
*/
private function scanContentUrls() {
/*
* Match all URLs
*/
preg_match_all("((https?|ftp|gopher|telnet|file|notes|ms-help):((//)|(\\\\))+[\w\d:#@%/;$()~_?\+-=\\\.&]*)",
$this->__comment, $urlArray, PREG_SET_ORDER);
/*
* Cycle through and submit
*/
foreach ($urlArray as $url) {
if ($this->CheckUrlForSpam(trim($url[0]))) {
// Found spam so exit
return true;
}
}
/*
* No spam links found
*/
return false;
}
/**
* Check a URL against the SPAM
* database to determine if it is
* a SPAM submitted URL
*
* @param unknown_type $url
* @return unknown
*/
private function checkUrlForSpam($url) {
/*
* Create a new DNS URL class
* and check it against the URL
* database
*/
$surbl = new Net_DNSBL_SURBL();
if ($surbl->isListed($url)) {
// Spam
return true;
}
/*
* Nothing found, return
* false
*/
return false;
}
// }}}
}
// }}}
Example Usage:
All values are of known spammers at the time of posting.
Code:
<?php
// {{{ Header
/**
* ASCII Post/Comment Spam Checker Test
*
* PHP versions 5
*
* LICENSE:
*
* Copyright (c) 2008 CodeCall.net
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted under the terms of the BSD License.
*
* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
* POSSIBILITY OF SUCH DAMAGE.
*
*
* @category Copy and Paste
* @author Jordan (CodeCall.net)
* @date 9-30-2008
* @version 1.0
* @link http://www.codecall.net
* @copyright 2008 CodeCall.net
* @uses PEAR::Net_DNSBL
* @uses Akismet (http://www.achingbrain.net/stuff/php/akismet)
*
*
*/
error_reporting(E_ALL);
// }}}
// {{{ Implementation
/*
* Include the class file
*/
include "SpamChecker.php";
/*
* Needed data
*/
$wordPressApiKey = 'APIKEY';
$ownerUrl = 'http://www.jordandelozier.com';
/*
* Create known spam Akismet variables for
* testing purposes. We are looking
* for false.
*/
$akismetSpam = array('comment'=>'What charming message http://www.zulucutie.com',
'name' =>'lanellgiz',
'email' =>'latesha@buyclialis.info',
'url' =>'df3gd.com',
'ip' =>'89.28.114.111'
);
/*
* Now we want to make the black lists fail.
* It should have passed above and been blank.
* In order to make the black lists fail, we need
* to override a server settings.
*/
$_SERVER['REMOTE_ADDR'] = '41.110.2.2';
/*
* Create a new instance of the class
*/
$spamChecker = new SpamChecker($akismetSpam['comment'], $akismetSpam['name'],
$akismetSpam['email'], $akismetSpam['url'],
$wordPressApiKey, $ownerUrl);
/*
* Run it and print the results
*/
echo "<pre>";
print_r($spamChecker->isSpam());
echo "</pre>";
// }}}
Output:
The class returns an associative array containing boolean values of each test. If any test is true, the spam key will be true.
Array
(
[Akismet] => 1
[BlackLists] => 1
[SpamURLs] => 1
[Spam] => 1
)
See it in Action!
Visit ASCIIBin and submit any content. Before content is submitted this class is executed. If you are a spammer, or submitting spam, it will reject.