19/01/2007 à 03h29 : LAMP optimization

*** Disclaimer: this is not a howto, this is just my personal experience ***

I host several websites on a web server at home on a DSL line: zeRezo.com.
Since the beginning, I have been careful about bandwidth saturation, since even today I only have 1Mbits of upload.
So I try to keep my websites "light", or at least I avoid hosting big files such as videos, big software packages, etc.
When I need big files, I use external hosting. It is not a big issue, since I still have all my standard web pages hosted at home, so I keep a good idea of how many visitors I have.

After some years of such hosting, even if there have been some hardware updates (CPU, RAM, disks), my server starts to be overloaded.
The number of visitors is not so important, but it is still higher than before (great news), some websites have improved their features, and due to historical data the databases are bigger and bigger...
So anyway the fact is that today, sometimes, my server is very slow, almost not responding to local requests, so it was really time to investigate!
After some googling, I found this nice post which gives basic steps to identify the "weak" point on your installation.
The article focuses on the 3 main points of congestion for an overloaded server: I was again amazed to see how SpamAssassin was consuming memory on my system.
I had 5 process of SpamAssassin running on my system (default option) and it looks like they do not share their database in memory!
Since my system does not handle so much email, I changed default Debian option in /etc/default/spamassassin to have only one process:
# NOTE: version 3.0.x has switched to a "preforking" model, so you
# need to make sure --max-children is not set to anything higher than
# 5, unless you know what you're doing.

OPTIONS="--create-prefs --max-children 1 --helper-home-dir"
Except this small detail, both RAM and disk usage are ok on my server.
So the problem is CPU, and it is not so difficult to see, the top command at some times of the day can show very high load (>>10) with 100% CPU usage of course.

So now we know what to do: optimize dynamic web pages, doing faster PHP code and nicer MySQL queries.
The other issue is how to identify which pages are responsible for the CPU usage.
There are nice options in Apache logging option, the %T and %D options allow to track "the time taken to serve the request".
I still run the old 1.3 version at home, so I only have the %T option (in seconds), and the result was not so helpful in my case: I sometime have very high times for static pages, which is strange...
But I know which are the bad pages on my server, since on many websites I include a little footer with the computation time of the page. Something like this:
function getmicrotime()
{ 
	list($usec,$sec)=explode(" ",microtime());
	return ((float)$usec+(float)$sec);
} 
$time_start=getmicrotime();

/* the ugly code here */

$time_end=getmicrotime();
$time=$time_end-$time_start;
printf('Page generated in %f seconds',$time);
So I often see in my footers that I host ugly slow code :)

To begin, I focused on MySQL queries, since it seemed to be the longer part of my slow web pages.
Here again, MySQL has got nice logging option.
In my configuration file, I switched on the log-slow-queries option:
# Here you can see queries with especially long duration
log-slow-queries = /var/log/mysql/mysql-slow.log
When this option is present, all queries that take more than X seconds to complete are stored in the log file.
It is then easy to locate the slower queries.

Also, in order to profile SQL queries on a specific page, I use the quick & dirty following technique. I replace all calls to mysql_query() by a custom _mysql_query().
It would be nicer to just override the mysql_query() builtin function, but I don't think I can do this with my PHP version.
So here is what it looks like:
function _mysql_query($string)
{
	global $REMOTE_HOST;

	/* global timer to store total time spent in SQL queries */
	global $time_mysql;

	/* start te timer */
	$t1=getmicrotime();

	/* do the query and save the result to return it to the caller */
	$r=mysql_query($string); 

	/* stop the timer */
	$t2=getmicrotime();

	/* only show the trace for the developer */
	if ($REMOTE_HOST=='my_computer_name')
		print '<div class="mysql" title="'.$string.'">'.round(1000*($t2-$t1)).'</div>';

	/* increase total SQL time */
	$time_mysql+=$t2-$t1;

	/* return the results to caller just like mysql_query would */
	return $r;
}

/* ... */

$query = 'SELECT * FROM foo';
$result = mysql_query($query);
The time I am talking about here is "real" time, not CPU time spent on this specific process, so it is not very accurate, but still helpful.

I use a style for these trace <div> so they look like small boxes. The "title" on them allows to view the SQL query just by moving the mouse over it:


On this basic website, since the boxes are written when the query is done, it is easy to understand when and why a specific query was done, just looking where it is visually located on the website layout.

The grey boxes are another trace I use, to check for time spent in PHP.
To compute this time, I also use timers, but I remove the SQL time to really focus on PHP code time:
function _trace($string)
{
	global $REMOTE_HOST;

	/* global timer started at the beginning of the page */
	global $time_start;

	/* total time spent in SQL queries */
	global $time_mysql;

	/* only show the trace for the developer */
	if ($REMOTE_HOST=='my_computer_name')
		print '<div class="php" title="'.$string.'">'.
			round(1000*(getmicrotime()-$time_start-$time_mysql)).'</div>';
}

/* ... */

_trace('start big command');
very_slow_procedure();
_trace('stop big command');
Again, this allowed me to find very nasty things in my website.
For example, instead of using a static array of smilies, I used a PHP loop to parse the local smiley filenames (so if I had a smiley I don't have to update the code).
Ok this is not very nice, but I did not suppose it would be so slow ;-)
Now this is fixed!

One last tool which can be useful: ab. This is Apache HTTP server benchmarking tool, which allows to do many requests at a time on your server.
ab -c 10 -n 100 http://google.com/
For example this command will do 100 requests on http://google.com/, witch 10 requests in parallel.
Be careful with this command, it can stuck your server if you use too big number!
I used it in combination with a custom script to monitor LAMP activity: I still have CPU issues...
The next step could be another hardware upgrade... or to switch to a smarter solution like professional hosting.

Commentaires

Pseudo :
Site : (facultatif)
Email : (facultatif, pour votre avatar)
Message :
(pas de HTML)
: (recopiez le texte)
Se souvenir de mes informations

Nono [ 16/02 - 09:48 ] : Pour à peine 1000 euros TTC tu as des serveurs Dell quadri coeur avec 1Go de RAM :) Mais bon, ça fait un petit investissement!

Nono [ 19/01 - 19:28 ] : Et MySQL, il peut peut être être configuré pour faire du cache de requête ?

Royale [ 19/01 - 17:20 ] : huats : Thank you, I will have a look at it.

Royale [ 19/01 - 17:01 ] : Nono : Le cache j'y pense sérieusement pour les pages dynamiques qui bougent le moins. Le deuxième serveur ça va être difficile ;)

huats [ 19/01 - 13:08 ] : I've seen a terrific keynote by Rasmus Lerdorf himself, about Php optimisation. You can find the slides of the talk here : http://talks.php.net/show/oscon06

Nono [ 19/01 - 11:09 ] : Avec une charge de 10 sur ton serveur de configuration quand même "correcte", tu peux oublier l'hébergement mutualisé. Et de 'hébergement dédié sur une configuration plus puissante que la tienne, ça va douiller...

Peux être que mettre en place un mécanisme de cache... ou mettre la BD sur une autre machine.. Bientôt chez Royale, une baie de serveurs :-)