[SlugBug] server health

Chris J cej at nightwolf.org.uk
Thu Mar 18 13:22:05 GMT 2004


> 
> > 1:40pm  up 266 days,  4:50,  1 user,  load average: 0.00, 0.00, 0.00
> > 56 processes: 55 sleeping, 1 running, 0 zombie, 0 stopped
> > 
> > CPU0 states: 23.0% user, 76.0% system,  0.0% nice,  0.0% idle
> > CPU1 states: 22.0% user, 77.0% system,  0.0% nice,  0.0% idle
> > 
> > Mem:  1036048K av, 550288K used, 485760K free, 21512K shrd, 185188K buff
> > Swap: 2048248K av,    184K used, 2048064K free            296524K cached
> 
> what concerns me is that it the processors are 0.0% idle - i take this 
> to mean that the CPU's are working flat out.

Your processors may be flat out, but the load average is 0. So it's not
actually working all that hard - you'll notice most of the CPU is 
in the system as well, which is the kernel. In essence, it looks like
the kernel is really busy trying to do something, whilst normal
processes are doing nothing at all.

The load average numbers basically are the number of programs that are
in a "RUNNABLE" state averaged over 1, 5 and 15 minutes. The higher this
number, the more processes competing for CPU time at any instant. On a 
dual CPU system, a load average of 2 would mean both processors are
working flat out servicing a single process each. In a single CPU system,
it'd show that two processes are competeing for the same CPU.

How much % of CPU were the top processes using, and what are they?

> i wouldn't have thought that there was an issue with the RAM as so 
> little swap is being used.

Correct.

> 
> should i be thinking of upgrading the processors or, dare i say it, 
> should i be thinking of rebooting the machine?
> 
> i looked at ps aux but nothing there gave me heart failure and sorting 
> top with CTRL+M looked ok too.
> 
> i could try stopping and starting the busiest processes but i thought 
> i'd run it past here first.
> 

Unless the system seemed unreasonably slow, I'd leave it as is, but keep
an eye on it. If you can get away with killing the CPU-intensive process, 
it could be worth doing - it may have got stuck in a loop or is locked
waiting for the kernel to respond (in which case, the process will be in 
state 'D', and will be unkillable without a reboot as it's a kernel
lock). Thinking about it, getting a list of process that are in the
'D' state may help; they won't use any CPU themselves as they're waiting,
so killing the top processes might not do owt. But they may point in a 
suitable direction for more poking.

But in short, the kernel is busy doing something. Very busy.

Chris...

-- 
\ Chris Johnson           \
 \ cej at nightwolf.org.uk    ~-----,   
  \ http://cej.nightwolf.org.uk/  ~-----------------------------------, 
   \ Redclaw chat - http://redclaw.org.uk - telnet redclaw.org.uk 2000 \____


More information about the SlugBug mailing list