The method I would like to describe is based on the webserver
lighttpd.
Lighttpd is a single process webserver written for high traffic sites.
It supports fast-cgi out of the box which makes it ideal for hosting PHP applications.
There are lots of nice modules for the daily
work like mod_access or mod_rewrite.
For more infos see the internals
There are also some benchmarks there.
Lighty´s home is always worth having a look at.
Assume a web application written in php (take C if you want,
it’s not important) compiles some layout and content from different
sources and fires it to a web client. In the scripting world,
a common way to optimize the response time of an application
is often as follows:
the application by itself saves every page result (or fragments)
into a cache file before delivering.
The next time a routine has to check whether it could output cached
content or not. The cache has to be rebuild periodicaly or when
the application updates some data entries – it deletes cache files
– and generates them again.
But there is a bottleneck: every time the webserver gets a
request it has to start the application – even there are no changes made on the data in the background.
That’s slow because you need to initialize a script processor, a script
or something else in the world of application logic.
It would be nice if cached files could delivered as static content.
But the question is: how can a webserver decide if a cache file
is outdated without involving the application!
Let’s take a look on a possible solution.
Before you start to experiment by yourself you would need some software from the following shopping list
You should have installed:
- MySQL 5
- libevent (needs by memcached)
- memcached (the memcache demon)
- libmemcache (C API to memcached)
- lighttpd (the webserver)
- LUA (needed by lighttpd’s mod_cml)
- PHP as fast-cgi (lighttpd supports natively fast-cgi, PHP5 is good)
- PHP’s memcache module (install it with pecl)
For building and configuring lighttpd read the documentation, it’s fairly easy.
After that, you need to compile and install some user defined functions for MySQL to communicate with memcached.
- udf_memcache (for accessing memcached from MySQL)
If there is anything I have forgotten, please don’t ask ;-).
The idea
A little bit lighttpd config (remove the comment which disables mod_cml from server.modules):
server.modules = ( ..., "mod_cml", ... )
For more lighttpd configs see the docs and the living manual.
The server configuration is nice. You can do a lot with regex btw.
Now configure a virtual host:
$HTTP["host"] == "cml.testhost.org" { server.document-root = "/home/test/htdocs/" cml.power-magnet = "/home/test/htdocs/index.cml" }
Every request should now pass directly to the „power-magnet“. Lighttpd passes
requests (if installed) after rewriting url’s and so on to the cml module which
will be controlled by this power-magnet controlled file; code is in LUA.
Here is such a .cml file which we need for our little project:
output_contenttype = "text/html" -- find the content_key: content_key = md5(request["REQUEST_URI"]) -- get version from memcache if there is one: version = memcache_get_long(content_key) cwd = request["CWD"] -- index.php will be called, if no cache file is found trigger_handler = "index.php" -- cache decision: if version >= 0 then output_include = { cwd .. content_key .. "-" .. version .. ".html" } return 0 else return 1 end
The first time a request comes in and the version number (see listing above)
is available the CML script/LUA determined the content_key (md5 hash of
REQUEST_URI) and current version of the page. This version is
allways the newest, so take it and deliver it.
If no version is available the script returns 1 – what lets the mod_cml
pass the request to the trigger_handler, here „index.php“.
When data entry has been changed (e.g. in the table tbl_content) the version number
of that entry always has to be incremented.
A simple PHP application
To simplify matters we say our content is just embedded into the right layout
so we only need to fetch it from a MySQL database table:
CREATE TABLE tbl_content ( id VARCHAR(32) NOT NULL, content TEXT, version INT NOT NULL DEFAULT '1', PRIMARY KEY (<code>id</code>) )
A request could be handled as follows (not safe but simple!):
<?php // actions: index.php?id=?&mode=show|edit|save switch ($_REQUEST['mode']) { case 'save': save($_REQUEST['id'], $_REQUEST['content']); $location = sprintf("index.php?id=%s&mode=show", $_REQUEST['id']); header("Location: $location"); break; case 'edit': edit(); break; case 'show': show($_REQUEST['id']); break; } echo "Error!"; exit; ?>
For save() and show() I created two stored functions „get_content“
and „set_content“ because I wanted to hide the memcache related
things from the application:
CREATE FUNCTION get_content(p_id VARCHAR(32)) RETURNS TEXT BEGIN DECLARE c TEXT; SELECT content INTO c FROM tbl_content WHERE id = p_id; RETURN c; END; CREATE FUNCTION set_content( p_id VARCHAR(32), p_content TEXT) RETURNS INT BEGIN DECLARE ver INT; DECLARE s TEXT; INSERT tbl_content SET id = p_id, content = p_content, version = 1 ON DUPLICATE KEY UPDATE content = p_content, version = version + 1; SELECT version INTO ver FROM tbl_content WHERE id = p_id; SELECT memcache_set("127.0.0.1:11211", p_id, ver) INTO s; RETURN 1; END;
Every time a data entry is to be inserted or updated, the application should count up
the version number assigned to content-key.
The only magic in the stored function set_content above is the memcache_set call.
MySQL natively doesn’t have it – its an UDF (user defined function) written in C
which you need to compile by yourself (but it’s easy). Have a look at Jan Kneschke’s
page about MySQL and memcached – there
you can find the C Sources and a little SQL install script for MySQL 5 (and of course
another udf source that enables MySQL to send email, hey!).
Because of using the stored function the save() routine is very simple:
/** * Saves content. * * @param string $id content id * @param string $content the content */ function save($id, $content) { $db = mysql_connect(':/tmp/mysql.sock', 'root', ''); $query = sprintf("SELECT my_db.set_content('%s', '%s')", mysql_escape_string($id), mysql_escape_string($content)); mysql_query($query); return ; }
The edit function:
/** * displays content. * * @param string $id content id */ function show($id) { // {{{ fetch content: $db = mysql_connect(':/tmp/mysql.sock', 'root', ''); $query = sprintf("SELECT my_db.get_content('%s')", mysql_escape_string($id)); $result = mysql_query($query); $content = ''; if ($rs = mysql_fetch_array($result)) { $content = $rs[0]; } // }}} // {{{ determine content key $content_key = md5($_SERVER["REQUEST_URI"]); // }}} // {{{ determine version $memcache = new Memcache; $memcache->connect('localhost', 11211);; $version = $memcache->get($content_key); // }}} // {{{ write cache file, output content to browser and exit file_put_contents('/path/to/cache/dir/' . $content_key . '-' . $version . '.html', $content); echo $content; exit; // }}} }
Here we also need to calculate the content_key and determine the version number of the content_key.
With that informations we can write the cache file.
That’s it. The resume: In some environments it makes sence to cache pages or fragments of pages
without loading your application. The use of memcache is very handy – it is quite
fast and there are some APIs around to interface with the demon.
Another important point is that LUA is no substitution for PHP or other great scripting
languages. LUA in lighttpd’s context is extremly fast because it is „embedded“ as a shared
lib and loaded once the server starts.
Great article!
A bad point about your approach is that you mix the APP logic between your PHP script and mySql. I don’t like this as it ties you to mysql. if you wish to change to another DB, eg. Postgre or oracle, in the future, you need to rewrite the stored procedures. If you had placed the app logic sorely in the PHP script, then changing DB is simple (assuming you use a DB abstraction layer class)
The link to http://jan.kneschke.de/projects/mysql/udf/
is no longer available. Is there another resource for UDF-mysql_udf_lua-0.9.2.tar.gz ?
Ich danke schon seit einiger Zeit über schnelle Anwenungs-Caches für Internet-Anwendungen nach. Die bevorzugten Zutaten sind der Webserver Lighttpd, die Sprachen LUA und PHP sowie den Server Memcached, mit dem die Verteilung von flüchtigen Daten im Cluste
mod_cml was replaced with mod_magnet, so this is the new way to do it
http://gadelkareem.com/2007/09/17/dynamic-content-caching-using-lighty-mod_magnet-lua/