[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: SV: [cobalt-security] amd root?



Hi Kai,

Graeme's explanation is indeed very good. 

> Should not all of this generate som warning message like low memory or high
> cpu or does it happend so quick that the system has no time for that?

Sometimes it happens too quick and you will receive no warning in advance. 

Lets run some figures first so that you get the idea:

Assume a RaQ3 (300MHz, with 128 MB of Memory) with 45 virtual sites on it, 
most of them mainly visited by people from the US. 

Then throw in a big MySQL database with approx. 140 MB of data and 5000-8000 
emails per day due to many active mailing lists. Now assume a monthly traffic 
of 50 to 75 Gigabytes, mostly stemming from a single highly active PHP 
message board with MySQL backend.

Now assume the server clock is set to GMT+1, like many European ISPs do on 
their machines. 

Based on my own experience I can guarantee you that almost each morning at 
4:50 European Time the system will either lock up, or will be swapping madly, 
being just a fraction of an inch away from total lockup. That happened often 
enough to me (until I spent my ISPs outrageous charge for a memory upgrade).

Why does it happen? Well, at 4:50 Central European Time the parsing of the 
logfiles will be at it's peak. Also, Webalizer might run at the same time (if 
launched from cron.daily) and other maintenance tasks. But 4:50 am European 
is also prime surfing time for the US visitors which the server might have. 
At one such time when I was able to log in I had 272 simulteanous connections 
to port 80 (!), most of them in the status "TIME_WAIT", as Apache was already 
above and beyond it's configured maximum for allowed simulteanous connections.

I had my server set up to page me when critical messages like overload or 
memory outage occured. Additionally I had tampered with Logckeck so that each 
quarter-hourly report included a load average report and memory usage report 
of the server (output of the commands "w" and "cat /proc/meminfo", echoed 
into the report).

Furthermore an external monitoring service was monitoring the machine for 
outages and unduly long response times.

In most cases the external service reported the server going down between 
4:45 am and 5:30 am and I didn't receive a single warning in advance from the 
machine - except for one or two cases. But 'til I had jumped out of the bed 
the machine was already locked up and not letting me in by SSH.

I then upped the memory to 384MB (like you did) and was then able to push the 
bandwidth up to 100 Gig per month without further outages.

Like I said before: If you got many visitors from the US, then the servers 
most active time might be while you're sleeping. When you go to bed you see a 
pretty solid, but not concerning load average. When you wake up, things will 
be idling along with next to no load at all. But for your machine the hours 
in between might have been hell on wheels thanks to the unduly scheduled 
maintenance tasks. 

FWIW: In my case it helped when I temporarily switched the server clock to US 
eastern time. In that case the maintenance was then running between 10 and 11 
am European time.


-- 

With best regards,

Michael Stauber
mstauber@xxxxxxxxxxxxxx
Unix/Linux Support Engineer