Methods to reduce the load of your webserver by caching content: using lighttpd, MySQL UDF, LUA and speed everything up.

The method I would like to describe is based on the webserver
lighttpd.

Lighttpd is a single process webserver written for high traffic sites.
It supports fast-cgi out of the box which makes it ideal for hosting PHP applications.
There are lots of nice modules for the daily
work like mod_access or mod_rewrite.
For more infos see the internals

There are also some benchmarks there.
Lighty´s home is always worth having a look at.

Assume a web application written in php (take C if you want,
it’s not important) compiles some layout and content from different
sources and fires it to a web client. In the scripting world,
a common way to optimize the response time of an application
is often as follows:
the application by itself saves every page result (or fragments)
into a cache file before delivering.
The next time a routine has to check whether it could output cached
content or not. The cache has to be rebuild periodicaly or when
the application updates some data entries – it deletes cache files
– and generates them again.

But there is a bottleneck: every time the webserver gets a
request it has to start the application – even there are no changes made on the data in the background.
That’s slow because you need to initialize a script processor, a script
or something else in the world of application logic.

It would be nice if cached files could delivered as static content.
But the question is: how can a webserver decide if a cache file
is outdated without involving the application!

Let’s take a look on a possible solution.

Before you start to experiment by yourself you would need some software from the following shopping list
You should have installed:

  • MySQL 5
  • libevent (needs by memcached)
  • memcached (the memcache demon)
  • libmemcache (C API to memcached)
  • lighttpd (the webserver)
  • LUA (needed by lighttpd’s mod_cml)
  • PHP as fast-cgi (lighttpd supports natively fast-cgi, PHP5 is good)
  • PHP’s memcache module (install it with pecl)

For building and configuring lighttpd read the documentation, it’s fairly easy.

After that, you need to compile and install some user defined functions for MySQL to communicate with memcached.

If there is anything I have forgotten, please don’t ask ;-).

The idea

A little bit lighttpd config (remove the comment which disables mod_cml from server.modules):

server.modules = ( 
  ..., 
  "mod_cml", 
  ... 
)

For more lighttpd configs see the docs and the living manual.
The server configuration is nice. You can do a lot with regex btw.

Now configure a virtual host:

$HTTP["host"] == "cml.testhost.org" {
    server.document-root = "/home/test/htdocs/"
    cml.power-magnet     = "/home/test/htdocs/index.cml"
}

Every request should now pass directly to the „power-magnet“. Lighttpd passes
requests (if installed) after rewriting url’s and so on to the cml module which
will be controlled by this power-magnet controlled file; code is in LUA.
Here is such a .cml file which we need for our little project:

output_contenttype = "text/html"

-- find the content_key:
content_key        = md5(request["REQUEST_URI"])

-- get version from memcache if there is one:
version            = memcache_get_long(content_key)

cwd                = request["CWD"]

-- index.php will be called, if no cache file is found
trigger_handler    = "index.php"

-- cache decision:

if version >= 0 then
  output_include = { cwd .. content_key .. "-" .. version .. ".html" }
  return 0
else
  return 1
end

The first time a request comes in and the version number (see listing above)
is available the CML script/LUA determined the content_key (md5 hash of
REQUEST_URI) and current version of the page. This version is
allways the newest, so take it and deliver it.
If no version is available the script returns 1 – what lets the mod_cml
pass the request to the trigger_handler, here „index.php“.

When data entry has been changed (e.g. in the table tbl_content) the version number
of that entry always has to be incremented.

A simple PHP application

To simplify matters we say our content is just embedded into the right layout
so we only need to fetch it from a MySQL database table:

CREATE TABLE tbl_content (
  id      VARCHAR(32) NOT NULL,
  content TEXT,
  version INT NOT NULL DEFAULT '1',
  PRIMARY KEY  (<code>id</code>)
)

A request could be handled as follows (not safe but simple!):

&lt?php
    // actions: index.php?id=?&mode=show|edit|save
    switch ($_REQUEST['mode']) {
        case 'save':
            save($_REQUEST['id'], $_REQUEST['content']);
            $location = sprintf("index.php?id=%s&mode=show", $_REQUEST['id']);
            header("Location: $location");
            break;
            
        case 'edit':
            edit();        
            break;
    
        case 'show':
            show($_REQUEST['id']);
            break;        
    }
    echo "Error!";
    exit;
?>

For save() and show() I created two stored functions „get_content“
and „set_content“ because I wanted to hide the memcache related
things from the application:

CREATE FUNCTION get_content(p_id VARCHAR(32)) 
RETURNS TEXT
BEGIN
  DECLARE c TEXT;
  SELECT content INTO c
    FROM tbl_content
   WHERE id = p_id;
  RETURN c;
END;


CREATE FUNCTION set_content(
  p_id      VARCHAR(32), 
  p_content TEXT)
RETURNS INT
BEGIN
  DECLARE ver INT;
  DECLARE s TEXT;

  INSERT tbl_content 
     SET id      = p_id,
         content = p_content,
         version = 1
  ON DUPLICATE KEY 
  UPDATE content = p_content,
         version = version + 1;
  SELECT version 
    INTO ver
    FROM tbl_content
   WHERE id = p_id;
  SELECT memcache_set("127.0.0.1:11211", p_id, ver) INTO s;
  RETURN 1;
END;

Every time a data entry is to be inserted or updated, the application should count up
the version number assigned to content-key.
The only magic in the stored function set_content above is the memcache_set call.
MySQL natively doesn’t have it – its an UDF (user defined function) written in C
which you need to compile by yourself (but it’s easy). Have a look at Jan Kneschke’s
page about MySQL and memcached – there
you can find the C Sources and a little SQL install script for MySQL 5 (and of course
another udf source that enables MySQL to send email, hey!).

Because of using the stored function the save() routine is very simple:

    /**
     * Saves content.
     *
     * @param   string  $id         content id
     * @param   string  $content    the content
     */
    function save($id, $content) {
        $db      = mysql_connect(':/tmp/mysql.sock', 'root', '');
        $query   = sprintf("SELECT my_db.set_content('%s', '%s')", 
                   mysql_escape_string($id), 
                   mysql_escape_string($content));
        mysql_query($query);
        
        return ;
    }

The edit function:

    /**
     * displays content.
     *
     * @param   string  $id     content id
     */
    function show($id) {
        // {{{ fetch content:
        $db      = mysql_connect(':/tmp/mysql.sock', 'root', '');
        $query   = sprintf("SELECT my_db.get_content('%s')", mysql_escape_string($id));
        $result  = mysql_query($query);
        $content = '';
        if ($rs = mysql_fetch_array($result)) {
            $content  = $rs[0];
        }
        // }}}

        // {{{ determine content key
        $content_key = md5($_SERVER["REQUEST_URI"]);
        // }}}
        
        // {{{ determine version
        $memcache    = new Memcache;
        $memcache->connect('localhost', 11211);;
        $version     = $memcache->get($content_key);
        // }}}
        
        // {{{ write cache file, output content to browser and exit
        file_put_contents('/path/to/cache/dir/' . $content_key . '-' . $version . '.html', $content);
        echo $content;
        exit;    
        // }}}
    }

Here we also need to calculate the content_key and determine the version number of the content_key.
With that informations we can write the cache file.

That’s it. The resume: In some environments it makes sence to cache pages or fragments of pages
without loading your application. The use of memcache is very handy – it is quite
fast and there are some APIs around to interface with the demon.

Another important point is that LUA is no substitution for PHP or other great scripting
languages. LUA in lighttpd’s context is extremly fast because it is „embedded“ as a shared
lib and loaded once the server starts.

5 Kommentare

  1. A bad point about your approach is that you mix the APP logic between your PHP script and mySql. I don’t like this as it ties you to mysql. if you wish to change to another DB, eg. Postgre or oracle, in the future, you need to rewrite the stored procedures. If you had placed the app logic sorely in the PHP script, then changing DB is simple (assuming you use a DB abstraction layer class)

  2. Ich danke schon seit einiger Zeit über schnelle Anwenungs-Caches für Internet-Anwendungen nach. Die bevorzugten Zutaten sind der Webserver Lighttpd, die Sprachen LUA und PHP sowie den Server Memcached, mit dem die Verteilung von flüchtigen Daten im Cluste

Schreibe einen Kommentar

Deine E-Mail-Adresse wird nicht veröffentlicht. Erforderliche Felder sind mit * markiert