During my free time, I help to administer a machine hosting several websites which generate copious amount of traffic and at times, there can be a significant load on the server. These websites require many tasks running on the back end to keep content fresh for incoming visitors. These tasks include sending out emails, polling blogs for new content, encoding video files, organizing and moving files around, updating caches, etc.
Minimizing server overloads
One of the ways to squeeze more performance out of a busy server is to break up all these tasks into individual jobs. The jobs are placed into queues with different levels of priority. Additionally, there are running worker processes that constantly check the queues for new jobs to take on.
During surges of visitor activity, there’s less worrying about the server load shooting through the roof and slowing down everything. The jobs are simply inserted into the queues and the limited number of worker processes will eventually get to them all. Rather than having the CPU constantly near 100% utilization, the work is spread out over time so other services will not be starved. The websites will still be snappy fast, even during the periods of activity surges.
In short, being able to slightly delay tasks that don’t absolutely need to be run immediately is an excellent way to keep a system running fast during periods of high activity.
Discovering Redis and Resque
After some researching to find a job/queue management system that’s flexible enough to run any type of jobs I could toss at it, I decided upon Resque. It makes use of an excellent data structure server (not database!) called Redis. Redis is ideal since it’s almost brain-dead simple to use its capabilities, can be easily scaled from one machine to many machines, and with all the data stored in RAM, it’s blindly *fast* [more details on Redis here]. Resque sets up and manages the queues on Redis.
An excellent analogy about Redis and Resque would be that of a Warehouse/Inventory manager. Redis is the huge warehouse where many boxes (jobs) can be stored. Resque is the inventory manager who brings in the boxes, knows where they should go in the warehouse, monitors the boxes, makes sure the boxes are taken care of, and takes the boxes out when they no longer need to be stored in the warehouse.
The number of worker processes can be fine-tuned for your server. If the server starts to experience growing pains and there’s not enough workers to take care of the jobs quick enough, the number of workers can be increased. If the server starts to max out on its internal resources, it’s a trivial task to offload the workers to other servers to relieve the load on the primary server. Once applications are created with Resque in mind, they will have built-in ability to scale easily in the future.
…But my web developers use PHP, not ruby
Resque is based on ruby, which is a great language in its own right, and is easily extended using many third-party libraries. However, for many web developers, their preferred choice (or they may not any say in the matter) of language is PHP. Fortunately, Chris Boulton did the hard work of porting Resque to PHP and it’s called php-resque.
Installing Redis and php-resque on Ubuntu 12.04
To simplify the installation and maintenance of a Redis server on Ubuntu, check out the excellent dotdeb.org repository which offers up-to-date versions of several popular packages such as PHP, PHP extensions, MySQL, and…. Redis server. Another reason to have this repository on a server is that APT can be used to easily keep Redis server up to date in the future, avoiding the need to compile.
The packages of dotdeb.org repository are targeted towards Debian systems but are usable on Ubuntu. It’s highly recommended to test dotdeb packages first on a test server to ensure that nothing else will break. Be sure you are comfortable with the APT packaging tools (and have good back ups) before heading down this path! You have been forewarned!
Back up the php.ini for cli and apache2
cp /etc/php5/apache2/php.ini /etc/php5/apache2/php.ini.bak cp /etc/php5/cli/php.ini /etc/php5/cli/php.ini.bak
Set up the dotdeb repository and get the system up to date:
deb http://packages.dotdeb.org stable all deb-src http://packages.dotdeb.org stable all
Approve dotdeb’s GnuPG key:
wget -q -O- http://www.dotdeb.org/dotdeb.gpg | apt-key add -
Special note: if you want only the redis-server package from dotdeb.org repository and not upgrade PHP5 or mysql-server, read this follow-up post.
Update the system:
sudo apt-get upgrade sudo apt-get update
sudo apt-get install php5 [Run this even if PHP5 is already installed on the system]
Before PHP5 update:
$ php -v PHP 5.3.10-1ubuntu3.2 with Suhosin-Patch (cli) (built: Jun 13 2012 17:19:58) Copyright (c) 1997-2012 The PHP Group Zend Engine v2.3.0, Copyright (c) 1998-2012 Zend Technologies
After PHP5 update:
$ php -v PHP 5.3.15-1~dotdeb.0 with Suhosin-Patch (cli) (built: Jul 23 2012 12:25:58) Copyright (c) 1997-2012 The PHP Group Zend Engine v2.3.0, Copyright (c) 1998-2012 Zend Technologies with Suhosin v0.9.33, Copyright (c) 2007-2012, by SektionEins GmbH
Make sure this line is not commented out in /etc/php5/apache2/php.ini
session.save_path = "/tmp"
$ sudo service apache2 restart
Install redis-server
sudo apt-get install redis-server
Install php-resque and run the demo
Note: php-resque uses PHP Composer to install packages, so:
In another directory:
curl -sS https://getcomposer.org/installer | php sudo mv composer.phar /usr/local/bin/composer
If you don’t have git on your system yet:
sudo apt-get install git git clone https://github.com/chrisboulton/php-resque.git cd php-resque composer install [install dependency packages listed in composer.json] cd demo [edit job.php and change sleep(120) to sleep(5), makes the demo run faster]
Start the workers
bash -c "VVERBOSE=1 QUEUE=* COUNT=2 php resque.php"
Submit a few jobs
php queue.php PHP_Job php queue.php PHP_Job php queue.php PHP_Job php queue.php PHP_Job
Watch workers perform the work. When done, kill all workers at once:
killall php
Running your own custom jobs
It’s now a simple matter to create custom jobs inside another class in job.php and queue multiple runs of the job class.
Now we’ll start over and use PHP Composer to get things set up for a new project:
$ cd my_proj
create file called composer.json:
{ "require": { "chrisboulton/php-resque": "1.2.x" } }
$ composer install $ mkdir files ; cd files [your custom code will go into this directory]
To pass values to your custom job class:
For example, to pass the third argument (‘google.com’) from the command line to your job, i.e.
$ php queue.php Ping_Job google.com
resque.php :
<?php date_default_timezone_set('GMT'); require 'job.php'; require __DIR__ . '/../vendor/chrisboulton/php-resque/resque.php'; ?>
queue.php :
<?php if(empty($argv[1])) { die('Specify the name of a job to add. e.g, php queue.php Ping_Job google.com'); } require __DIR__ . '/../vendor/autoload.php'; date_default_timezone_set('GMT'); Resque::setBackend('127.0.0.1:6379'); $args = array( 'host' => $argv[2], ); $jobId = Resque::enqueue('default', $argv[1], $args, true); echo "Queued job ".$jobId."\n\n";
job.php :
<?php class Ping_Job { public function perform() { $host_pinged = $this->args['host']; echo "\n ==== \n pinging $host_pinged \n"; ... rest of custom code } }
Start the workers and submit several jobs
$ bash -c "VVERBOSE=1 QUEUE=* COUNT=2 php resque.php" $ php queue.php Ping_Job google.com $ php queue.php Ping_Job yahoo.com $ php queue.php Ping_Job techmeme.com