Overview
One of the first tools I use when performing problem determination is top. The top program is a very powerful little utility that provides a great deal of information about your running system. This includes data about memory usage, CPU loads and a list of running processes including the amount of CPU time and memory being utilized by each process.
Top displays system information in near real-time, updating (be default) every three seconds. It is also interactive and the data columns to be displayed can be modified.
Output of top
A sample output from the top program is shown below. The output from top is divided into two sections which are called the “Summary” section, which is the top section of the output, and the “Task” section which is the lower portion of the output.
Summary Section
The Summary section of the output from top is an overview of the system status. The first line shows the system uptime and the 1, 5 and 15 minute load averages. The second line shows the number of tasks currently active and the status of each.
The lines containing CPU statistics are shown next. There can be a single line which combines the statistics for all CPUs present in the system, or as in the example below, one line for each CPU; in the case of the computer used for the example this is a single quad core CPU. Press the 1 key to toggle between the consolidated display of CPU usage and the display of the individual CPUs. The data in these lines is displayed as percentages of the total CPU time available.
The other fields for these CPU data have changed over time and I had a difficult time locating information about the last three as they are relatively new. So here is a description of all of these fields.
- us: userspace – Applications and other programs running in user space, i.e., not in the kernel.
- sy: system calls – Kernel level functions. This does not include CPU time taken by the kernel itself, just the kernel system calls.
- ni: nice – Processes that are running at a positive nice level.
- id: idle – Idle time, i.e., time not used by any running process.
- wa: wait – CPU cycles that are spent waiting for I/O to occur. This is wasted CPU time.
- hi: hardware interrupts – CPU cycles that are spent dealing with hardware interrupts.
- si: software interrupts – CPU cycles spent dealing with software-created interrupts such as system calls.
- st: steal time – The percentage of CPU cycles that a virtual CPU waits for a real CPU while the hypervisor is servicing another virtual processor.
The last two lines in the Summary section are memory usage. They show the physical memory usage including both RAM and swap space.
top - 09:47:38 up 13 days, 24 min, 6 users, load average: 0.13, 0.04, 0.01
Tasks: 180 total, 1 running, 179 sleeping, 0 stopped, 0 zombie
Cpu0 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu1 : 0.9%us, 0.9%sy, 0.0%ni, 98.1%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu2 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu3 : 1.0%us, 0.0%sy, 0.0%ni, 99.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 2056456k total, 797768k used, 1258688k free, 92028k buffers
Swap: 4095992k total, 88k used, 4095904k free, 336252k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
6292 root 20 0 99.4m 32m 7808 S 12 1.6 9:07.65 X
519 root 15 -5 0 0 0 S 1 0.0 18:20.23 ata/2
24947 root 20 0 2404 1052 788 R 1 0.1 0:00.19 top
29674 dboth 20 0 36996 16m 13m S 1 0.8 9:46.76 kicker
29793 dboth 20 0 36168 17m 13m S 1 0.9 0:07.17 konsole
1 root 20 0 2112 636 544 S 0 0.0 0:05.01 init
2 root 15 -5 0 0 0 S 0 0.0 0:00.21 kthreadd
3 root RT -5 0 0 0 S 0 0.0 0:00.85 migration/0
4 root 15 -5 0 0 0 S 0 0.0 3:20.01 ksoftirqd/0
5 root RT -5 0 0 0 S 0 0.0 0:00.04 watchdog/0
6 root RT -5 0 0 0 S 0 0.0 0:32.28 migration/1
7 root 15 -5 0 0 0 S 0 0.0 3:30.32 ksoftirqd/1
8 root RT -5 0 0 0 S 0 0.0 0:00.00 watchdog/1
9 root RT -5 0 0 0 S 0 0.0 0:03.70 migration/2
10 root 15 -5 0 0 0 S 0 0.0 5:16.94 ksoftirqd/2
11 root RT -5 0 0 0 S 0 0.0 0:00.00 watchdog/2
12 root RT -5 0 0 0 S 0 0.0 0:00.56 migration/3
13 root 15 -5 0 0 0 S 0 0.0 2:54.63 ksoftirqd/3
14 root RT -5 0 0 0 S 0 0.0 0:00.00 watchdog/3
15 root 15 -5 0 0 0 S 0 0.0 0:03.55 events/0
16 root 15 -5 0 0 0 S 0 0.0 0:15.47 events/1
17 root 15 -5 0 0 0 S 0 0.0 0:05.17 events/2
18 root 15 -5 0 0 0 S 0 0.0 0:02.22 events/3
19 root 15 -5 0 0 0 S 0 0.0 0:00.00 khelper
78 root 15 -5 0 0 0 S 0 0.0 0:01.32 kblockd/0
79 root 15 -5 0 0 0 S 0 0.0 0:00.39 kblockd/1
80 root 15 -5 0 0 0 S 0 0.0 0:00.41 kblockd/2
81 root 15 -5 0 0 0 S 0 0.0 0:13.20 kblockd/3
83 root 15 -5 0 0 0 S 0 0.0 0:00.00 kacpid
84 root 15 -5 0 0 0 S 0 0.0 0:00.00 kacpi_notify
164 root 15 -5 0 0 0 S 0 0.0 0:00.00 cqueue
166 root 15 -5 0 0 0 S 0 0.0 0:00.00 ksuspend_usbd
171 root 15 -5 0 0 0 S 0 0.0 0:00.11 khubd
174 root 15 -5 0 0 0 S 0 0.0 0:00.00 kseriod
226 root 15 -5 0 0 0 S 0 0.0 8:55.50 kswapd0
Task Section
The Task section of the output from top is a listing of the running processes in the system — at least the for the number of processes, or tasks, for which there is room on the terminal display. The default columns displayed by top are described below. Several other columns are available and each can usually be added with a single keystroke; refer to the top man page for details.
- PID – The Process ID.
- USER – The username of the process owner.
- PR – The priority of the process.
- NI – The nice number of the process.
- VIRT – The total amount of virtual memory allocated to the process.
- RES – Resident size (in kb unless otherwise noted) of non-swapped physical memory consumed by a process.
- SHR – The amount of shared memory in kb used by the task.
- S – The status of the task. This can be R for running, S for sleeping, and Z for zombie. Less frequently seen statuses can be T for traced or stopped, and D for uninterruptible sleep.
- %CPU – The percentage of CPU cycles, or time used by this task during the last measured time period. The default is three seconds but can be changed. Fractional seconds are allowed, but very small values can overload the system.
- %MEM – The percentage of physical system memory used by the task.
- TIME+ – Total CPU time to 100ths of a second consumed by the process since the task was started.
- COMMAND – This is the command that was used to launch the task.
Be sure to read the man page for top because there is a large amount of information about configuring and interacting with top. Also use the h key for help in interactive mode. This help can provide you with information about selecting and sorting the columns of data, setting the update interval and much more.
What top can tell you
The top program can tell you a great deal when you are looking for the cause of a problem. It can tell you when a process, and which one, is sucking up all the CPU time, whether there is enough free memory, whether processes are stalled while waiting for I/O such as disk or network access to complete, and much more.
Using the interactive “k” command top also allows you to kill processes that may be hogging CPU time.
I strongly recommend that you spend time watching top running on a system while it is functioning normally so you will be able to differentiate those things that may be abnormal while you are looking for the cause of a problem.
Remember, top is your friend!