Pesky spambots are enough to ruin anyones fun!

Hey we all know its a nightmare to get spammed by these incredibly annoying bots that patrol the internet which flood websites with uber amounts spam. Well, we have solutions for that. If you are working with Drupal then the likely choice is Mollom which is of course free for small sites and blogs, bonus!

By when you think of protecting your forms which come to mind: comments, registration, contact? Likely these do and they are certainly the focus for most bots since they are easily accessed and promoted around a site. But what about your login and lost password forms? You wouldn't believe me if I told you there are thousands of vulnerable sites online, the simple thing is that developers just don't think about this sort of thing until, well... it's too late.

Did you know that these are prime ways to DDoS a website, simply because these form submissions have to directly hit the database to validate the submission, roh roh! I mean companies spend fortunes on developing caching strategies to protect backend web php processes, yet these sorts of things are like an achilles heel.

Fortunately there are a good handful of modules, including Mollow that allows you to protect these forms out of the box. Respecting Mollom it's just not enabled by default so you better go do that now if you haven't already.

As I was saying there are other modules that allows you to thawt pesky spambots for added protection, one that came straight to mind is Honeypot. This great module actually does a couple of simple things to trick up bots including adding hidden fields to the form that if completed invalidate the submission, genius.

So you see there isn't really anything to be worried about if you've protected your forms. The question is, have you?

Now I know you are interested in protecting your forms, here is a list of services you can employ:

  • Mollom - Offers spam protection through their freemium service.
  • CAPTCHA - A simple image-based CAPTCHA builder.
  • reCAPTCHA - Implements reCAPTCHA for image-based CAPTCHAs.
  • BOTCHA - Has many different bot-defeating recipes.
  • Hidden CAPTCHA - Similar to this module in it's effectiveness.
It's like collecting $200 for every 404 called

Did you know that a 404 could cripple your website? We'll believe it or not misconfigured or ignored 404s could be killing your websites performance. Let's see how.

First things first, what do 404s look like in Drupal 7 out of the box? If you view the headers of a 404 page you will see the follow:

Cache-Control:no-cache, must-revalidate, post-check=0, pre-check=0

This header basically tells all caches onward not to cache the page. If this is the case then what you have is a potential hole for DDoS yourself. We can say this because not only does it bypass cache but a full Drupal bootstrap then depending then on what your 404 page is, you could have any number of blocks/panes/views loading all of which could be expensive to generate. If there were a flurry of 404 in quick succession then you might be in trouble. So it is really important to trace through your application for the case of a 404.

fast_404

So the clever people in the Drupal community identified this was a big deal and initially rolled out a simply fast_404 check in core and then developed the fast_404 module. As the project page states this aims to return a 404 errors using less than 1MB of memory on your server, also allowing for some pretty advanced and comprehensive configurations.

Drupal 7 core includes a basic fast_404 check early in the bootstrap process which can be enabled by uncommenting the line drupal_fast_404(); in the default settings.php. In this case Drupal checks the path prior to bootstrapping the database, leading to faster page returns for most 404 responses as well as a lessening of the overall memory usage of the site.

In summary its worthwhile checking your apache logs regularly to keep a track of 404s hitting your website so either you can fix broken links or add redirects as needed. The other piece is to check how your site performs when a 404 is requested, you wont regret double checking. If you are using a cache upstream, it would be worthwhile also adding a short TTL so these will be caught and protect your back ends.

Little too often I come across a cache_form table that is bursting at the seams, not from anything bizarre, although that actually happens a lot, rather simply because developers lose track of its intended purpose and id like to think they inadvertently abuse it. With a bit of initiative i've witnessed 79GB cache_form tables reduce to 16GB saving their business from the need to upsize disks and threaten stability

But why is this a problem in the first place? Good question. The cache_form table isn't really a cache table at all when you compare it with the other cache tables Drupal utilises. This table is poorly named since its purpose is to preserve the form state as it was when the user last submitted the form. Now, as you can imagine on most modern websites forms are used all over the place, and you guessed it they all have respective cache_form entries.

Ugh, what make this a real issue is that Drupal does not manage this table at all! It just continues to push content into the table like there is no tomorrow. This problem is not new, its been around since Drupal 6 (http://drupal.org/node/226728) where the cache_forms table isn't cleared when the items are expired and what is more incredible simply running cron doesn't help either!

You can truncate the cache_form table, however this has the possible side-effect of dropping forms that might be in the process of being submitted. To remediate such an issue with your cache_form table firstly look at the areas of your sites that have forms and are contributing to the problem. The second part is to regularly purge stale entries, something along the lines of the following should suffice:

DELETE FROM {cache_form} WHERE expire < now();

A more elegant solution would be to implement a hook_cron method which does the heavy lifting for you, like such:

mymodule_cron() { cache_clear_all(NULL, 'cache_form'); }

During this purging process the entries which are not older than 6 hours will not be deleted, which is ideal, but 6 hours is a long time, where is this defined? We can identify the issue here in the source: https://api.drupal.org/api/drupal/includes!form.inc/function/form_set_cache/7, this can be a real problem.

function form_set_cache($form_build_id, $form, $form_state) { // 6 hours cache life time for forms should be plenty. $expire = 21600;

As the comment reads they are assuming it should be plenty and in likely your case 6 hours is far too long. So the trick is to clear the cache_form table more frequently and patch the value of $expire to a lower value. Remember patching core is not cool, but its perfectly acceptable if done correctly and responsibly.

I find over and over many performance issues come down to this particular function - variable_set(). In all cases it is a module invoking this function, even under minimal load that proceeds to cause a Database stampede, backend 503s and outages. How can this be? let me show you. This is going to get deep quickly so do your best to follow along since this is really important to understand.

function variable_set($name, $value) { global $conf; db_merge('variable')->key(array('name' => $name))->fields(array('value' => serialize($value)))->execute(); cache_clear_all('variables', 'cache_bootstrap'); $conf[$name] = $value; }

Let's step through what this function is doing in Drupal 7. Firstly its calling a global on the $conf variable which ironically is an array representation of the variables table that Drupal uses though-out the life-time of the page execution. On every bootstrap this is key-value array is loaded in from the cache_bootstrap table if it exists if not the variables table.

The second line of code is doing an update on the variables table, although you wouldn't know that simply looking at the command. Using the db abstraction layer the db_merge generates a query that looks like this to the db:

SELECT 1 AS expressionFROM variable variableWHERE ( (name = 'drupal_css_cache_files') ) FOR UPDATE

So at this point we have updated the database with the latest version of the variable, next comes the crippling part. Then the entire variable cache is cleared since we have a new value by invoking cache_clear_all() emptying the cache_bootstrap table where the serialised variable array was cached.

The final line of code innocently sets the global $config key to the new value and the call is completed, so what is the problem here? Well basically now the function has exited and the page has hopefully rendered successfully we have been left with a empty cache_bootstrap bin.

If this were to happen on every single page request we are rendering our variable (bootstrap) caching completely useless. Queue the Achilles heal of Drupal caching, since if this event happens when you have multiple servers serving many PHP processes then your Database is going to get slammed with those special queries above. It is a best practice to use Innodb for your Drupal databases, however Innodb uses row level locks. The worse part is that because the SELECT FOR UPDATE creates a lock on the variables table every subsequent query has to wait for the lock to release, eventually you end up with a DEADLOCK and a website returning 503s. How DEADLOCKS occur read through this comment by Heine

So really variable_set is a very destructive method, which must only be called from a responsible operation triggered from the administration console or another event that controls the fallout that follows.

Remediation options to follow...

Subscribe to Front page feed
Design by Jon