This week, one server had a LOAD of more than 2500% (around 150 with 6 CPUs), but the CPU load was only around 5%, so this was a sign, that the CPU load was not the cause of the high LOAD on the server:
top - 15:53:42 up 7 days, 10:01, 2 users, load average: 159,47, 149,89, 160,80
Tasks: 540 total, 1 running, 468 sleeping, 0 stopped, 0 zombie
%Cpu(s): 2,0 us, 2,0 sy, 0,0 ni, 0,0 id, 95,6 wa, 0,0 hi, 0,4 si, 0,0 st
KiB Mem : 12296516 total, 607940 free, 9710388 used, 1978188 buff/cache
KiB Swap: 12578812 total, 7439140 free, 5139672 used. 1752884 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
5564 user@xyz 20 0 99712 36384 5308 D 4,6 0,3 0:00.22 spamassassin
1539 root 20 0 2394080 55984 7412 S 2,0 0,5 1365:51 fail2ban-server
4561 root 20 0 33784 6168 3700 S 1,3 0,1 0:02.39 htop
8 root 20 0 0 0 0 I 0,7 0,0 11:17.44 rcu_sched
...
Hard disc operations as a possible reason
Not only a high CPU load, but also a high number of IO operations, read and write access to the hard drive, can cause a high LOAD.
Monitoring tools
There are numerous monitoring tools to analyse the IO of a system. Tools like vmstat
, pmstat
and iostat
can help you, but I’ve also used iotop
, a tool very similar to top
and htop
for my analysis. The output looks like this:
Total DISK READ : 3.25 M/s | Total DISK WRITE : 97.42 K/s
Actual DISK READ: 3.35 M/s | Actual DISK WRITE: 151.97 K/s
TID PRIO USER DISK READ DISK WRITE SWAPIN IO> COMMAND
14457 be/4 example- 1067.69 K/s 0.00 B/s 0.00 % 78.58 % php-fpm: pool 160096117520684
958 be/4 example- 2.20 M/s 0.00 B/s 0.00 % 71.42 % php-fpm: pool 160096117520684
19990 be/4 mysql 3.90 K/s 3.90 K/s 0.00 % 0.50 % mysqld --daemonize --pid-file=/run/mysqld/mysqld.pid
...
You can use the cursor keys to select another column you want to sort the processes by (in this example the column IO
is selected, marked with the >
at the end).
This tool also didn’t show any single process causing huge numbers of IO operations to the disc. But there were a lot of php-fpm
processes on top of the list.
Finding blocked processes
The reason for the very high LOAD on the server was the high number of processes waiting to be executed. If such processes want to access the disc, but they are unable to, the fall into an “uninterruptible sleep”. Processes with this status can be found using the ps
command:
$ ps -ax | grep D
PID TTY STAT TIME COMMAND
6292 ? D 1:26 php-fpm: pool 151673661024723
8746 pts/0 S+ 0:00 grep --color=auto D
22261 ? D 0:03 php-fpm: pool 160096117520684
24583 ? D 0:03 php-fpm: pool 16104533493131
24886 ? D 0:02 php-fpm: pool 160096117520684
24894 ? D 0:02 php-fpm: pool 160096117520684
24896 ? D 0:02 php-fpm: pool 160096117520684
25048 ? D 0:02 php-fpm: pool 160096117520684
25050 ? D 0:01 php-fpm: pool 160096117520684
25052 ? D 0:01 php-fpm: pool 160096117520684
26265 ? D 0:01 php-fpm: pool 16104533493131
26315 ? D 0:01 php-fpm: pool 16104533493131
27967 ? D 0:01 php-fpm: pool 16104533493131
28711 ? D 0:01 php-fpm: pool 16104533493131
28966 ? D 0:01 php-fpm: pool 16104533493131
29202 ? D 0:01 php-fpm: pool 160096117520684
...
This simple found (with some wrong matches) a lot of php-fpm
processes that were currently blocked.
Solution: restart processes
The solution was rather simple. Restarting the various php-fpm
services (the server was using multiple PHP versions) eliminated those blocked processes. The LOAD went back to around 0.5 and the server was running fast again.
Conclusion
If y server has a high LOAD, the reason is not always a high CPU load. A large number of IO operations to the disc as well as blocked processes can also lead to a high LOAD. With some useful tools, you can find and solve such issues.