An app is running “slow” on Linux, how do you go about figuring out what could be wrong? Typical bottle-necks for a slow application are cpu, disk i/o, network, memory, or database. For instance if Apache is running slow it could be memory starved. Some of the tools we can use to debug application are listed below. Keep in mind that when using these tools, you generally want to take multiple snapshots in time, and not necessarily base your information on a single moment in time.
1) Vmstat shows you a number of interesting statistics. Some things to look for in vmstat include the proc ‘r’ column, which shows the number of processes waiting for run time. If this number if “high” it indicates CPU starvation. CPU starved apps run slow. To fix, increase the number of CPUs in the system. Another field to look at in the output of vmstat is the ‘cs’ field under the system heading. ‘cs’ stands for context switching. A “high” number means that the CPU is having to switch between a lot of tasks. These tasks might be network, disk, or cpu related. One way to solve this problem is to see if the server is doing “too many” things, i.e., running too many applications. Reduce the number of applications running to solve this problem.
Linux uses available RAM to store inode information or filesystem meta data in ‘buffers’ and it also uses cache to store pages of actual file contents. This is to speed up disk access.
As you can see from the below, this system has 377GB of physical memory, out of which 356GB is available, even though the free column says that we have 16GB “free”. The reason for this is that Linux is using 352GB for buffers and cache combined, i.e, for inode and file data. By doing this disk access speed has increased. This will be given back to the system if there is pressure on memory.
$ free -m total used free shared buff/cache available Mem: 386445 16613 8443 4110 361388 364610 Swap: 8191 317 7874 $ free -g total used free shared buff/cache available Mem: 377 16 8 4 352 356 Swap: 7 0 7
$ vmstat -s 16301872 total memory 5718716 used memory 3179672 active memory 2228164 inactive memory 10583156 free memory 166264 buffer memory 2913548 swap cache 8191992 total swap 572 used swap 8191420 free swap 26013290 non-nice user cpu ticks 881 nice user cpu ticks 17260265 system cpu ticks 7326487040 idle cpu ticks 13013070 IO-wait cpu ticks 72 IRQ cpu ticks 8054 softirq cpu ticks 0 stolen cpu ticks 11212676 pages paged in 180145305 pages paged out 44 pages swapped in 145 pages swapped out 4176361125 interrupts 1524882692 CPU context switches 1367818576 boot time 9070041 forks2) mpstat: To view processor utilization based on per core.
$ mpstat -P ALL Linux 2.6.32-358.6.1.el6.x86_64 (hostname) 08/20/2013 _x86_64_ (8 CPU) 10:48:05 PM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %idle 10:48:05 PM all 0.03 0.00 0.23 0.18 0.00 0.00 0.00 0.32 99.24 10:48:05 PM 0 0.05 0.00 0.42 0.72 0.00 0.00 0.00 0.69 98.13 10:48:05 PM 1 0.05 0.00 1.01 0.34 0.00 0.00 0.00 1.78 96.82 10:48:05 PM 2 0.04 0.00 0.11 0.04 0.00 0.00 0.00 0.03 99.78 10:48:05 PM 3 0.03 0.00 0.11 0.18 0.00 0.00 0.00 0.03 99.66 10:48:05 PM 4 0.02 0.00 0.07 0.01 0.00 0.00 0.00 0.02 99.87 10:48:05 PM 5 0.02 0.00 0.08 0.12 0.00 0.00 0.00 0.02 99.77 10:48:05 PM 6 0.01 0.00 0.04 0.00 0.00 0.00 0.00 0.01 99.93 10:48:05 PM 7 0.01 0.00 0.04 0.01 0.00 0.00 0.00 0.01 99.933) iostat : If you need help in figuring out how your disk subsystem is performing, use iostat. “The iostat command generates three types of reports, the CPU Utilization report, the Device Utilization report and the Network Filesystem report.” (From the man page of iostat) Items of concern here would be iowait%, which if high indicates that the CPU is waiting for disk I/O to complete. If %util hits 100% then device saturation occurs.
4) netstat: This is an extremely useful command to diagnose numerous network issues.
To view routing table use:$ netstat -r Kernel IP routing table Destination Gateway Genmask Flags MSS Window irtt Iface 216.218.223.240 * 255.255.255.240 U 0 0 0 br0 216.218.223.240 * 255.255.255.240 U 0 0 0 em1 link-local * 255.255.0.0 U 0 0 0 br0 default uplink241.gigo. 0.0.0.0 UG 0 0 0 br0If you are having network errors and you want to see interface statistics, use netstat -i:
$ netstat -i Kernel Interface table Iface MTU Met RX-OK RX-ERR RX-DRP RX-OVR TX-OK TX-ERR TX-DRP TX-OVR Flg br0 1500 0 44804569 0 0 0 3708480 0 0 0 BMRU em1 1500 0 49533635 0 0 0 7760855 0 0 0 BMRU em1:1 1500 0 - no statistics available - BMRU lo 16436 0 32131 0 0 0 32131 0 0 0 LRU vnet0 1500 0 4 0 0 0 225 0 0 1 BMRU vnet1 1500 0 3761563 0 0 0 45308606 0 0 0 BMRUTo view TCP/UDP statistics run:
$netstat -s (snip) Tcp: 510 active connections openings 170202 passive connection openings 1233 failed connection attempts 114 connection resets received 1 connections established 3038753 segments received 3310750 segments send out 11236 segments retransmited 0 bad segments received. 469 resets sent (snip) Udp: 304134 packets received 37 packets to unknown port received. 0 packet receive errors 304186 packets sent (snip)5) top : All time favorite for most engineers. Display a number of statistics, relating to CPU, memory and processes. Extremely useful to see which processes are taking up most resources on the system.
6) traceroute: Traceroute sends 3 UDP packets at a time to an invalid port address on the destination host. The TTL is set to 1 initially, as soon as the packets reach the first hop or router, the router will responds with an ICMP Time Exceeded message. Traceroute then sets the TTL to 2 and sends the packet again. Hop 2 will then respond with TTL exceeded, and traceroute will then sedn 3 packets with TTL of 3. Traceroute keeps doing this until it reaches the destination. Using traceroute we can figure out the network path to a destination. Also, if ICMP is blocked by some routers we can use TCP with the -T flag. In the output of traceroute if you see ‘*’ it means that router is not returning an ICMP message.
$ traceroute www.google.com traceroute to www.google.com (74.125.239.147), 30 hops max, 60 byte packets 1 66.220.4.225 (66.220.4.225) 8.269 ms 8.142 ms 8.581 ms 2 10gigabitethernet1-1.core1.pao1.he.net (184.105.213.66) 8.382 ms 8.410 ms 2.025 ms 3 184.105.224.254 (184.105.224.254) 12.300 ms 12.341 ms 12.355 ms 4 209.85.240.114 (209.85.240.114) 1.242 ms 1.256 ms 1.257 ms 5 66.249.95.31 (66.249.95.31) 12.384 ms 12.442 ms 12.402 ms 6 nuq05s02-in-f19.1e100.net (74.125.239.147) 12.339 ms 11.187 ms 11.206 ms7) ping: Ping uses ICMP ECHO_REQUEST to measure the time it takes to reach a destination. ICMP stands for internet control messaging protocol. Ping is very useful to see if a host is alive. You can ping the broadcast address of a network to see how many network devices are on the network. Ping is also used to measure network latency. Round-trip times and packet loss statistics are computed. If you see
$ ping www.google.com PING www.google.com (74.125.239.144) 56(84) bytes of data. 64 bytes from nuq05s02-in-f16.1e100.net (74.125.239.144): icmp_seq=1 ttl=59 time=1.18 ms 64 bytes from nuq05s02-in-f16.1e100.net (74.125.239.144): icmp_seq=2 ttl=59 time=1.23 ms 64 bytes from nuq05s02-in-f16.1e100.net (74.125.239.144): icmp_seq=3 ttl=59 time=1.20 ms 64 bytes from nuq05s02-in-f16.1e100.net (74.125.239.144): icmp_seq=4 ttl=59 time=1.19 ms8) Valgrind: This is normally not installed on CentOS box, you can do the installation using ‘sudo yum install valgrind -y’. Valgrind is good at figuring out memory leaks. I often use this program when I find a system is running out of memory on a reoccurring basis. To use valgrind, run ‘valgrind —leak-check=yes yourprogram’. Under leak summary you will see definitely lost with ‘X’ number of bytes if there is a memory leak. You will also see line numbers of code if you had compiled the code with -g option to gcc. Once you figure out if the program has memory leaks based on the output of valgrind, next step is to fix the memory leak with the help of a developer. If that is not possible, as a temporary stop gap measure you can restart the application on a period basis if the application is able to tolerate restarts without interrupting end users. For instance if you are using ‘Jetty’ and the Java app has a memory leak, restart Jetty will give you the memory back temporarily. Of course you have to ensure that there are other severs running Jetty which can handle the load while this particular server is restarted. (See http://valgrind.org for additional information)
9) oprofile: Oprofile is a system application profiler, very useful to figure out what an application is doing.
10) tcpdump: The most reliable way for snooping network traffic. In the below example I am using tcpdump to filter traffic with destination http://www.google.com. I then open a telnet session to http://www.google.com on port 80. As you can see from the tcpdump output below a 3 way TCP handshake is visible.
$ telnet www.google.com 80 Trying 74.125.239.116... Connected to www.google.com. Escape character is '^]'. ^]quit telnet> quit Connection closed. # tcpdump host www.google.com reading from file file, link-type EN10MB (Ethernet) 23:29:36.745130 IP myhost.com.54378 > nuq05s01-in-f16.1e100.net.http: Flags [S], seq 814284659, win 14600, options [mss 1460,sackOK,TS val 657765386 ecr 0,nop,wscale 7], length 0 23:29:36.746269 IP nuq05s01-in-f16.1e100.net.http > myhost.com.54378: Flags [S.], seq 217057752, ack 814284660, win 62392, options [mss 1430,sackOK,TS val 3024867195 ecr 657765386,nop,wscale 6], length 0 23:29:36.746293 IP myhost.com.54378 > nuq05s01-in-f16.1e100.net.http: Flags [.], ack 1, win 115, options [nop,nop,TS val 657765387 ecr 3024867195], length 0 23:29:39.736476 IP myhost.com.54378 > nuq05s01-in-f16.1e100.net.http: Flags [F.], seq 1, ack 1, win 115, options [nop,nop,TS val 657768378 ecr 3024867195], length 0 23:29:39.737655 IP nuq05s01-in-f16.1e100.net.http > myhost.com.54378: Flags [F.], seq 1, ack 2, win 975, options [nop,nop,TS val 3024870187 ecr 657768378], length 0 23:29:39.737673 IP myhost.com.54378 > nuq05s01-in-f16.1e100.net.http: Flags [.], ack 2, win 115, options [nop,nop,TS val 657768379 ecr 3024870187], length 011) perf: Perf collects performance counters for Linux.
12) tuned : RHEL has many profiles available that can be applied to a system. These profile include certain modifications to kernel/disk/network parameter which are suitable for the workload on a system. Tuned allows you to set preset profiles, such as virtual-host, virtual-guest, enterprise-storage, throughput-performance.
13) sar: sar collects and prints system activity report. This includes network, cpu, disk and memory.