TCP/IP Tuning

Improving Network Performance

When it comes to network performance, unless there is an explicit problem, leave the network optimization to the kernel.
If you see a network problem, then study the problem in detail and come up with a solution for that problem, and leave the rest alone.

Linux handles incoming network packets in the following way:

1. Hardware reception, frame arrives into the network
2. NIC sends a hard IRQ to the CPU about the frame
3. Soft IRQ state, wherein the kernel removes the frame from the NIC, passing it to the application
4. The application recevies the frame and handles it via standard POSIX calls such as read, recv, recvfrom

First, do the obvious, check for errors:

– Check duplex, full/half/auto (ethtool)
– Check speed, 10/100/1000/auto
– Check for errors using ‘netstat -i’

Next, try some NIC optimizations:

* Input Traffic
* Queue Depth
* Application Call Frequency
* Change socket queues
* RSS: Receive Side Scaling
* RFS: Receive Flow Steering
* RPS: Receive Packet Steering
* XPS: Transmit Packet Steering

One option is to slow down incoming traffic. You can do this by lowering the NICs device weight.Input traffic can be slowed down by reducing the value of /proc/sys/net/core/dev_weight.

Also try to increase the physical NIC queue depth to the maximum supported as shown below.

# ethtool --show-ring em1
Ring parameters for em1:
Pre-set maximums:
RX:		511
RX Mini:	0
RX Jumbo:	0
TX:		511
Current hardware settings:
RX:		200
RX Mini:	0
RX Jumbo:	0
TX:		511

# ethtool --set-ring em1 rx 511

# ethtool --show-ring em1
Ring parameters for em1:
Pre-set maximums:
RX:		511
RX Mini:	0
RX Jumbo:	0
TX:		511
Current hardware settings:
RX:		511
RX Mini:	0
RX Jumbo:	0
TX:		511

Next, increase the frequency at which the application calls recv or read.

Socket Queue can be veiewed using the below command. Pruned packets or collapsed packets indicate queue issues.

netstat -s | grep socket
    3339 resets received for embryonic SYN_RECV sockets
    2 packets pruned from receive queue because of socket buffer overrun
    99617 TCP sockets finished time wait in fast timer
    6 delayed acks further delayed because of locked socket
    39 packets collapsed in receive queue due to low socket buffe

You can increase the socket queue by changing the values of receive and send window as shown below. The values are min, default and max.

# cat /proc/sys/net/ipv4/tcp_wmem
4096	65536	16777216
# cat /proc/sys/net/ipv4/tcp_rmem
4096	87380	16777216

Receive-Side Scaling (RSS), also known as multi-queue receive, distributes network receive processing across several hardware-based receive queues, allowing inbound network traffic to be processed by multiple CPUs. RSS can be used to relieve bottlenecks in receive interrupt processing caused by overloading a single CPU, and to reduce network latency.
RSS should be enabled when latency is a concern or whenever receive interrupt processing forms a bottleneck. RSS may be enabled by default, depending upon your NIC driver.
To check if it is enabled, look in /proc/interrupts. For instance: ‘# egrep ‘CPU|eth0′ /proc/interrupts’. If only one entry is shown, then your NIC does not support RSS.

Receive Flow Steering (RFS) extends RPS behavior to increase the CPU cache hit rate and thereby reduce network latency. Where RPS forwards packets based solely on queue length, RFS uses the RPS backend to calculate the most appropriate CPU, then forwards packets based on the location of the application consuming the packet. This increases CPU cache efficiency.
RFS is disabled by default. To enable RFS, you must edit two files: /proc/sys/net/core/rps_sock_flow_entries and /sys/class/net//queues/rx-queue/rps_flow_cnt.
For /proc/sys/net/core/rps_sock_flow_entries, a value of 32768 is recommended for moderate server loads.
For /sys/class/net//queues/rx-queue/rps_flow_cnt value of rps_sock_flow_entries/N where N is number of receive queues on device. Number of receive queues is defined in the file
/sys/class/net//queues/rx-0/rps_cpus.

Receive Packet Steering is prefferred over RSS, since it is the software implementation of RSS and does not require NIC card support.
Check /sys/class/net//queues/rx-0/rps_cpus to see how many queues are configured. The rps_cpus files use comma-delimited CPU bitmaps. Therefore, to allow a CPU to handle interrupts for the receive queue on an interface, set the value of their positions in the bitmap to 1. For example, to handle interrupts with CPUs 0, 1, 2, and 3, set the value of rps_cpus to 00001111 (1+2+4+8), or f (the hexadecimal value for 15). To monitor which CPU is receiving network interrupts, looks for NET_RX in ‘watch -n1 cat /proc/softirqs’

Transmit Packet Steering is a mechanism for intelligently selecting which transmit queue to use when transmitting a packet on a multi-queue
device. To accomplish this, a mapping from CPU to hardware queue(s) is recorded. XPS is only available if the kconfig symbol CONFIG_XPS is enabled (on by
default for SMP). The functionality remains disabled until explicitly configured. For a network device with a single transmission queue, XPS configuration
has no effect, since there is no choice in this case. To enable XPS, the bitmap of CPUs that may use a transmit
queue is configured using the sysfs file entry: /sys/class/net//queues/tx-/xps_cpus

Some of the above material came from:
https://www.kernel.org/doc/Documentation/networking/scaling.txt and also RedHat Tuning Guide.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s