Find cause for high LOAD with low CPU load at the same time

This week, one server had a LOAD of more than 2500% (around 150 with 6 CPUs), but the CPU load was only around 5%, so this was a sign, that the CPU load was not the cause of the high LOAD on the server:

top - 15:53:42 up 7 days, 10:01,  2 users,  load average: 159,47, 149,89, 160,80
Tasks: 540 total,   1 running, 468 sleeping,   0 stopped,   0 zombie
%Cpu(s):  2,0 us,  2,0 sy,  0,0 ni,  0,0 id, 95,6 wa,  0,0 hi,  0,4 si,  0,0 st
KiB Mem : 12296516 total,   607940 free,  9710388 used,  1978188 buff/cache
KiB Swap: 12578812 total,  7439140 free,  5139672 used.  1752884 avail Mem 

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 5564 user@xyz  20   0   99712  36384   5308 D   4,6  0,3   0:00.22 spamassassin
 1539 root      20   0 2394080  55984   7412 S   2,0  0,5   1365:51 fail2ban-server
 4561 root      20   0   33784   6168   3700 S   1,3  0,1   0:02.39 htop
    8 root      20   0       0      0      0 I   0,7  0,0  11:17.44 rcu_sched
...

Hard disc operations as a possible reason

Not only a high CPU load, but also a high number of IO operations, read and write access to the hard drive, can cause a high LOAD.

Monitoring tools

There are numerous monitoring tools to analyse the IO of a system. Tools like vmstat, pmstat and iostat can help you, but I’ve also used iotop, a tool very similar to top and htop for my analysis. The output looks like this:

Total DISK READ :       3.25 M/s | Total DISK WRITE :      97.42 K/s
Actual DISK READ:       3.35 M/s | Actual DISK WRITE:     151.97 K/s
  TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN     IO>    COMMAND
14457 be/4 example- 1067.69 K/s    0.00 B/s  0.00 % 78.58 % php-fpm: pool 160096117520684
  958 be/4 example-    2.20 M/s    0.00 B/s  0.00 % 71.42 % php-fpm: pool 160096117520684
19990 be/4 mysql       3.90 K/s    3.90 K/s  0.00 %  0.50 % mysqld --daemonize --pid-file=/run/mysqld/mysqld.pid
...

You can use the cursor keys to select another column you want to sort the processes by (in this example the column IO is selected, marked with the > at the end).

This tool also didn’t show any single process causing huge numbers of IO operations to the disc. But there were a lot of php-fpm processes on top of the list.

Finding blocked processes

The reason for the very high LOAD on the server was the high number of processes waiting to be executed. If such processes want to access the disc, but they are unable to, the fall into an “uninterruptible sleep”. Processes with this status can be found using the ps command:

$  ps -ax | grep D
  PID TTY      STAT   TIME COMMAND
 6292 ?        D      1:26 php-fpm: pool 151673661024723
 8746 pts/0    S+     0:00 grep --color=auto D
22261 ?        D      0:03 php-fpm: pool 160096117520684
24583 ?        D      0:03 php-fpm: pool 16104533493131
24886 ?        D      0:02 php-fpm: pool 160096117520684
24894 ?        D      0:02 php-fpm: pool 160096117520684
24896 ?        D      0:02 php-fpm: pool 160096117520684
25048 ?        D      0:02 php-fpm: pool 160096117520684
25050 ?        D      0:01 php-fpm: pool 160096117520684
25052 ?        D      0:01 php-fpm: pool 160096117520684
26265 ?        D      0:01 php-fpm: pool 16104533493131
26315 ?        D      0:01 php-fpm: pool 16104533493131
27967 ?        D      0:01 php-fpm: pool 16104533493131
28711 ?        D      0:01 php-fpm: pool 16104533493131
28966 ?        D      0:01 php-fpm: pool 16104533493131
29202 ?        D      0:01 php-fpm: pool 160096117520684
...

This simple found (with some wrong matches) a lot of php-fpm processes that were currently blocked.

Solution: restart processes

The solution was rather simple. Restarting the various php-fpm services (the server was using multiple PHP versions) eliminated those blocked processes. The LOAD went back to around 0.5 and the server was running fast again.

Conclusion

If y server has a high LOAD, the reason is not always a high CPU load. A large number of IO operations to the disc as well as blocked processes can also lead to a high LOAD. With some useful tools, you can find and solve such issues.

Leave a Reply

Your email address will not be published. Required fields are marked *