IP Header Explained

Mininum 20 bytes, maximum 60 bytes.
Maximum size is 65,535 bytes for IP header + data.

– Version field (4 bits) for IPv4 or IPv6
– IHL or Internet Header Length (4 bits), it is the number of 32-bit words in the header
– DS field, (6 bits) or Differentiated Service
– ECN (2 bits) explicit congestion notification
– Total length (16-bit) indicates total length of IP datagram
– Identification (16 bits) – Helps identify each datagram, counter based, very imp for fragmentation.
– Flags (3 bits) – Used to indiciate fragmentation.
– Fragment offset (13 bits) – Used to indicate offset of fragmentation.
– TTL (8 bits) upper limit of routers the datagram can pass through. Normally set at 64, although 128 or 255 is also common.
– Protocol (8 bits) – Specifies protocol encapsulated. 6 is TCP and 17 is UDP.
– Header Checksum (16 bits) – Checksum of header only and not payload.
– Source IP (32 bit)
– Destination IP (32 bit)
– Options (Up to 320 bits/40 bytes)
– IP Data if any, up to 65,515 bytes

TCP Flags Explained

TCP Flags

TCP has six flags that can help you troubleshoot a connection. The flags are:

U – URG
A – ACK
P – PSH
R – RST
S – SYN
F – FIN

When using tcpdump command to troubleshoot network connections, you can view TCP conversations with these flags as follows:

# tcpdump 'tcp[13] & 32 != 0' #URG
# tcpdump 'tcp[13] & 16 != 0' #ACK
# tcpdump 'tcp[13] & 8  != 0' #PSH
# tcpdump 'tcp[13] & 4  != 0' #RST
# tcpdump 'tcp[13] & 2  != 0' #SYN
# tcpdump 'tcp[13] & 0  != 0' #FIN

Another way of expressing the values is ‘tcp-fin, tcp-syn, tcp-rst, tcp-push, tcp-ack, tcp-urg.’.

# tcpdump 'tcp[tcpflags] & tcp-urg != 0' #URG
# tcpdump 'tcp[tcpflags] & tcp-ack != 0' #ACK
# tcpdump 'tcp[tcpflags] & tcp-push != 0' #PSH
# tcpdump 'tcp[tcpflags] & tcp-rst != 0' #RST
# tcpdump 'tcp[tcpflags] & tcp-sync != 0' #SYN
# tcpdump 'tcp[tcpflags] & tcp-fin != 0' #FIN

A mnemonic to remember the above is ‘Unskilled Attackers Pester Real Security Folks’.

URG flag is used to indicate that the packet should be prioritized over other packets for processing.
This flag is not used often. I can only think of telnet that uses it.

#  tcpdump 'tcp[13] & 32 != 0'
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on br0, link-type EN10MB (Ethernet), capture size 65535 bytes
18:29:47.741814 IP destination > source.39697: Flags [P.U], seq 2342494158:2342494159, ack 1232430662, win 31, urg 1, options [nop,nop,TS val 877959588 ecr 703327131], length 1
18:29:51.293145 IP destination.telnet > source.39697: Flags [P.U], seq 12:13, ack 5, win 31, urg 1, options [nop,nop,TS val 877963140 ecr 703330673], length 1

SYN is used for starting a connection.
ACK is used to acknowledge packets received.
PSH is used to ask the receiving end not to buffer packets, but to process them as soon as they are received.
RST is used to denote that no service is listening on the given port.

# tcpdump -n -v 'tcp[tcpflags] & (tcp-rst) != 0'
tcpdump: listening on br0, link-type EN10MB (Ethernet), capture size 65535 bytes
18:44:11.087487 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 40)
    destination.ms-sql-s > source.x11: Flags [R.], cksum 0xb12d (correct), seq 0, ack 1435238401, win 0, length 0

More on TLS

TLS uses asymmetric and symmetric encryption. Asymmetric encryption is used for the initial communication, followed by faster symmetric key encryption.

Symmetric ciphers are stream based or block based. Stream based encrypt one message at a time. Block based take a number of bits, and encrypt them together as one. A few symmetric key encryption algorithms are:

– AES
– Blowfish
– RC4
– DES
– 3DES

A few asymmetric key encryption algorithms are:

– DH
– RSA
– Elliptic Curve
– DSS/DSA

A couple of message digest (MD) algorithms are:

– MD5
– SHA

If you want to see which algorithms an SSL server supports, use the tool ‘sslscan’ which can be installed using ‘yum install sslscan -y’.
You might have to enable EPEL repository to install using yum. After installation, if you run ‘sslscan http://www.google.com:443’ you will see a lot of very useful output, as show below. First you wil see the algorithms that sslscan supports, followed by the ones that http://www.google.com accepts. The most important item section is the one below:


Preferred Server Cipher(s):
SSLv2 0 bits (NONE)
SSLv3 128 bits ECDHE-RSA-RC4-SHA
TLSv1 128 bits ECDHE-RSA-RC4-SHA
TLS11 128 bits ECDHE-RSA-AES128-SHA
TLS12 128 bits ECDHE-RSA-AES128-GCM-SHA256

This is showing that http://www.google.com prefers SSLv3, TLSv1,1.1 and 1.2. The cipher suites preferred are ECDE-RSA-RC4-SHA.
EDCE is Elliptic Curve Ephemeral Diffie Hellman which supports PFS or Perfect Forward Secrecy.
Normally with RSA, a symmetric key is picked once as part of the SSL HELLO protocol. After that the key does not change.
This means that if the servers private key is compromised, then an attacker can get the symmetric key.
With EDCE and PFS, the symmetric key is changed every session, so even if one key is compromised, the other key will not be impacted.

You can configure Apache to prefer cipher suites, see https://httpd.apache.org/docs/current/ssl/ssl_intro.html and https://httpd.apache.org/docs/2.2/mod/mod_ssl.html#sslciphersuite.

Extending VM LVM disk

I have a CentOS VMs running in VMware, one of the VMs I was running was out of disk space. The disk was originally 85GB, I tried to increase it to 345GB. In an effort to increase the disk size, I tried:

  1. Shutdown the VM
  2. Increase the size of the disk
  3. Power on the VM
  4. Use parted to resize the partition

Unfortunately, step 4 did not work, parted complained that it could not detect the filesystem:

# parted
GNU Parted 2.1
Using /dev/sda
Welcome to GNU Parted! Type 'help' to view a list of commands.
(parted) print
Model: VMware Virtual disk (scsi)
Disk /dev/sda: 344GB
Sector size (logical/physical): 512B/512B
Partition Table: msdos

Number Start End Size Type File system Flags
1 1049kB 538MB 537MB primary ext4 boot
2 538MB 85.9GB 85.4GB primary lvm

(parted) resize
WARNING: you are attempting to use parted to operate on (resize) a file system.
parted's file system manipulation code is not as robust as what you'll find in
dedicated, file-system-specific packages like e2fsprogs.  We recommend
you use parted only to manipulate partition tables, whenever possible.
Support for performing most operations on most types of file systems
will be removed in an upcoming release.
Partition number? 2
Start?  [538MB]?
End?  [85.9GB]? 344GB
Error: Could not detect file system.

It looks like parted does not like LVM. The next thing that I did is to add a new partition, then use ‘pvcreate’ to create a new physical extent, and then add that to the volume group, as seen below:

# parted
GNU Parted 2.1
Using /dev/sda
Welcome to GNU Parted! Type 'help' to view a list of commands.
(parted) print
Model: VMware Virtual disk (scsi)
Disk /dev/sda: 344GB
Sector size (logical/physical): 512B/512B
Partition Table: msdos

Number  Start   End     Size    Type     File system  Flags
 1      1049kB  538MB   537MB   primary  ext4         boot
 2      538MB   85.9GB  85.4GB  primary               lvm

(parted) mkpart
Partition type?  primary/extended? primary
File system type?  [ext2]?
Start? 85.9
End? 344GB
Warning: You requested a partition from 85.9MB to 344GB.
The closest location we can manage is 85.9GB to 344GB.
Is this still acceptable to you?
Yes/No? yes
Warning: WARNING: the kernel failed to re-read the partition table on /dev/sda (Device or resource busy).  As a result, it may not reflect all of your changes until after reboot.
(parted) print
Model: VMware Virtual disk (scsi)
Disk /dev/sda: 344GB
Sector size (logical/physical): 512B/512B
Partition Table: msdos

Number  Start   End     Size    Type     File system  Flags
 1      1049kB  538MB   537MB   primary  ext4         boot
 2      538MB   85.9GB  85.4GB  primary               lvm
 3      85.9GB  344GB   258GB   primary

(parted) quit

#ls -l /dev/sda
sda   sda1  sda2  sda3

# pvcreate /dev/sda3
  dev_is_mpath: failed to get device for 8:3
  Physical volume "/dev/sda3" successfully created
  
# pvs
  PV         VG   Fmt  Attr PSize   PFree
  /dev/sda2  vg0  lvm2 a--   79.50g      0
  /dev/sda3       lvm2 a--  240.00g 240.00g
  
# vgextend  vg0 /dev/sda3
  Volume group "vg0" successfully extended

# vgs
  VG   #PV #LV #SN Attr   VSize   VFree
  vg0    2   4   0 wz--n- 319.49g 240.00g

How to hire talent

Introduction

Have been on both sides of a site reliability engineer interview, I have come to realize that interviewing is as much art as it is science. Although there are companies which have tried to make interviewing a numbers game, and to abstract out the human element, interviewing continues to be an imperfect process. Almost anything can go wrong in an interview, from both sides. From a companies perspective, you should not consider a candidate confirmed until he/she shows up for work on the first day. In this article, I list a general format that if adhered to, may prove to be useful for interviewing of software engineers, or site reliability engineers.

Format of Interview

Once a candidate has been identified for a role, or has been “sourced” through different means such as LinkedIn, referrals, or otherwise, a recruiter should try to arrange to arrange a phone conversation with the candidate. Once a phone conversation has been arranged, start with the below steps.

– Over the phone, recruiter asks some basic technical questions, passing of which is mandatory. The recruiter also judges if the candidate is a good fit for the position based on experience and job description. In terms of what kind of technical questions a recruiter should ask, they should be limited to less than half a dozen, and should be straight forward, that do not require any technical expertise on the recruiter’s behalf. A few examples for a SRE (site reliability engineer) position can be “Which command do you use to view network traffic on a Linux host?”, “Describe the process of opening a TCP port on a Linux host”, “What is the difference between a hard link and a soft link on a filesystem?”.

– There may be a 45 minute or so technical phone interview with an engineer if a candidate passes the initial phone screen with the recruiter. In this technical phone interview, ask additional questions on various topics, such as networking, filesystems, Linux internals, processes, etc, to get a better idea of the candidates strenghts.
After a successfull technical phone interview, next step should be to schedule an onsite interview.

– Onsite interview is generally held by 4 or 5 engineers, and may be broken up in the following format:
* Unix/Linux system internals
* General troubleshooting questions
* Scripting/programming
* Project management
* Scalable architecture

You should try to avoid having different folks ask the same question. You can also split the 5 interviews into a couple of days if you like. Try to avoid non-technical questions, you can guage other aspects of the person

What to look for in a candidate

From a companies perspective, below are some qualities you should look for in a candidate:
– Smart
– Able to get things done
– Self-directed
– Diverse
– Takes Initiative
– Solves Problems on their own

The question you should be asking yourself is “Is this an awesome person who I want to work with?”. The idea is to hire across the board with consistent standards. Your job is not to break the candidate, it is to try to figure out what he/she does best and where they are lacking in strength.

During the Interview

Before the interview starts, try to make the candidate comfortable. This may sound obvious, but it is not. I remember being in 5 hour interviews without a single break, feeling tired and thirsty and unable to concentrate. Give the candidate frequent breaks for hydration if needed. You want the candidate to bring out the best in a candidate, to do this:
– Come with some hints
– Give candidates with opportunities to success
– Your job is not to hand a test
– Don’t get stuck on a problem
– Make an interview enjoyable
– Don’t make a candidate feel misreable
– If they do not feel good, they won’t be a good referral
– Try to limit general questions, such as definitions, which can be easy to memorize and do not give an idea of depth of a topic.

Good interview questions

– Start with easy questions, work you way to harder problems
– Don’t look for specific answers, as a question may have different valid answers
– Ask 2 or 3 questions
– One question is ok if it has multiple areas to dive into

Comparatively, not so good interview feedback is:
– Is generally incomplete
– Don’t include pictures of the candidate

What not to ask

Don’t ask about:
– Religion
– Age
– Disability
– Genetic info
– National origin
– Pregnancy
– Race and color
– Gender

Asking questions about the above can result in legal complications. If candidate does not get hired, he may turn around and sue your company if you asking him/her about the above questions. He/she may claim that they did not get hired because they belong to a particular race or religion.

Limit questions on:
– Textbook knowledge questions
– Culture questions
– Communications questions

You can pick up on culture fit and communication questions based on the technical questions you ask.

Providing interview feedback

Good Interview Feedback is factual. It should in the least:
– Lists your questions
– Lists the answers as given by the candidate
– Additionally, feedback should include analysis of the interviewer.
– Transcibe code
– Include a high level summary
– Evaluate culture fit
– Role level knowledge
– General cognitive ability

About recordings:
– Avoid audio, video recording

How do you conduct your interviews?

How to setup a BIND DNS Server

The Internet runs mostly on ISC’s BIND DNS server. In this artcile I will explain how to setup a simple BIND server on a Linux box.
The basic steps involved in setting up BIND are:

– Download BIND from ISC’s website
When downloading BIND, you might be better off if you pick an ESV, or an extended support version. As of the writing of this article, 9.9.5 is the latest ESV.
ESV versions are supported for at least 3 years by ISC. You can check the latest ESV version here https://www.isc.org/downloads/. If you are wondering as to why not to use the vendor provided BIND version, it’s because the vendor may not be using the ESV version, and may also lag behind in patches. Using the latest ESV from BIND will get you the most stable version for production use.

– Compile BIND
You will need gcc, make to compile bind. You can run ‘sudo yum groupinstall “Development Tools” -y’ on your CentOS or RedHat box to install GCC. In order to compile BIND, after downloading it, untar it, and then try the following from the directory you have untarred the distriution in.

$sudo yum groupinstall "Development Tools" -y
$./configure --prefix=/opt/bind-9.9.5 --enable-newstats --with-libxml2
$make
#make install

The above will install BIND in /opt/bind-9.9.5, after that you can link /opt/bind to /opt/bind-9.9.5 with “sudo ln -s /opt/bind-9.9.5 /opt/bind”.
I have enabled newstats and with libxml because this allows me to view BIND query stats in a format which I find easy to use.

– Configure a master server
Now that you have downloaded BIND, and compiled it, as well as installed it, we will move onto configuring it. The basic steps in configuration are as follows:

I have placed extensive comments in the conf file and in the zone files, please review them in order to understand what information to place in the files

A. Create a named.conf zone file in etc dire of chroot. See https://github.com/syedaali/configs/blob/master/bind-master-named.conf for a sample with comments.
B. Install a root hints file, example is here https://github.com/syedaali/configs/blob/master/bind-root.hints.
C. Create a forward lookup zone file, sample is here https://github.com/syedaali/configs/blob/master/bind-example.com.zone.
D. Create a reverse lookup zone file, sample is here https://github.com/syedaali/configs/blob/master/bind-10.1.1.rev.
E. Create a loopback forward and reverse zone. Forward sample is here https://github.com/syedaali/configs/blob/master/bind-master.localhost.
Sample of reverse lookup for localhost is here https://github.com/syedaali/configs/blob/master/bind-127.0.0.rev.

– Install one slave server
A. Compile BIND as seen in above steps for master
B. Install BIND in /opt/bind-9.9.5
C. Link to /opt/bin
D. Repeat steps B through E from master, with the exception that in the zone file, specify slave instead of master.
Sample slave zone file is here https://github.com/syedaali/configs/blob/master/bind-slave-named.conf.

– Configure your clients to point to the slave server
It is generally a good idea to avoid pointing your BIND clients to the BIND master. Instead create two slaves, and point the clients to the slaves. For instance if your master has IP address 10.1.1.10, and you have two BIND slaves, with IP 10.1.1.11 and 10.1.1.12, then in your clients /etc/resolv.conf do the following:

$cat /etc/resolv.conf
domain example.com
nameserver 10.1.1.11
nameserver 10.1.1.2

In future blogs I will specify how to monitor BIND. Do share your experience with BIND setup here.

High Availability using Keepalived

HAProxy (http://haproxy.1wt.eu/) is a popular solution for load balancing servers. In a typical load balanced configuration, there may be a number of web servers behind a pair of HAProxy load balancers. The question arises, how do you load balance the HAProxy servers themselves? One way is to use ‘keepalived’ (http://www.keepalived.org/). In order to use this solution, you need at least two HAProxy servers. On both of them install keepalived as explained below. Both servers will have a floating IP which you can create a DNS record for and give that name to your clients. For instance http://www.example.com may have IP 10.1.1.30 which is the floating IP between the two HAProxy servers. Clients will attempt to connect to 10.1.1.30. Depending on which HAProxy server is the master, the IP will be owned by that server. If that server fails, then the backup server will start to issue gratuitous ARP responses for the same IP of 10.1.1.30 and the requests to the web servers will then go through the backup HAProxy server which has now become the primary.

– Install keepalived

$sudo yum install keepalived -y

– Setup two hosts with the following IP address. The floating IP address will be assigned to the virtual router instance in the config later.

10.1.1.10 is h1.example.com (HAProxy server 1)
10.1.1.20 is h2.example.com (HAProxy server 2)
10.1.1.30 is floating IP (shared between the two server)
10.1.1.100 is SMTP server

– Sample basic config file for master. Note the use of ‘state MASTER’ and also ‘priority 101’

! Configuration File for keepalived

global_defs {
   notification_email {
     admin@example.com
   }
   notification_email_from keepalived@example.com
   smtp_server 10.1.1.100
   smtp_connect_timeout 30
   router_id LVS_DEVEL
}

vrrp_instance VI_1 {
    state MASTER
    interface eth0
    virtual_router_id 51
    priority 101
    advert_int 1
    authentication {
        auth_type PASS
        auth_pass 1111
    }
    virtual_ipaddress {
        10.1.1.30
    }
}

– Setup backup server. Sample basic config file for backup is below. Note the use of ‘state BACKUP’ and also ‘priority 100’. Priority should be lower on backup.

! Configuration File for keepalived

global_defs {
   notification_email {
     admin@example.com
   }
   notification_email_from keepalived@example.com
   smtp_server 10.1.1.100
   smtp_connect_timeout 30
   router_id LVS_DEVEL
}

vrrp_instance VI_1 {
    state BACKUP
    interface eth0
    virtual_router_id 51
    priority 100
    advert_int 1
    authentication {
        auth_type PASS
        auth_pass 1111
    }
    virtual_ipaddress {
        10.1.1.30
    }
}

– Start keepalived on both master and backup.

$sudo service keepalived start

– Verify that keepalived is running. You can do this by checking the IP address on the MASTER.

On 10.1.1.10, MASTER, floating IP is assigned to eth0 when MASTER is up.

ip addr show eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000
    link/ether 00:50:56:bd:4c:c7 brd ff:ff:ff:ff:ff:ff
    inet 10.1.1.10/24 brd 10.1.1.255 scope global eth0
    inet 10.1.1.30/32 scope global eth0

As you can see the 10.1.1.30 IP is with the master. Check the IP of the BACKUP HAProxy host.
On BACKUP, we have only the BACKUP IP, and not the floating IP.

# ip addr show eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000
    link/ether 00:50:56:bd:7a:5f brd ff:ff:ff:ff:ff:ff
    inet 10.1.1.20/24 brd 10.1.1.255 scope global eth0

– You can also use tcpdump and verify that the master is sending VRRP advertisments.
VRRP uses Multicast to keep track of state, you can view multicast traffic using tcpdump as shown below.

#tcpdump net 224.0.0.0/4

Host h1.example.com is the master.

15:49:38.342468 IP h1.example.com > vrrp.mcast.net: VRRPv2, Advertisement, vrid 51, prio 100, authtype simple, intvl 1s, length 20
15:49:39.342767 IP h1.example.com > vrrp.mcast.net: VRRPv2, Advertisement, vrid 51, prio 100, authtype simple, intvl 1s, length 20
15:49:40.343062 IP h1.example.com > vrrp.mcast.net: VRRPv2, Advertisement, vrid 51, prio 100, authtype simple, intvl 1s, length 20
15:49:41.343371 IP h1.example.com > vrrp.mcast.net: VRRPv2, Advertisement, vrid 51, prio 100, authtype simple, intvl 1s, length 20

– To test the failover, turn off keepalived on the master using ‘service keepalived stop’.
Once I stop keepalived on the MASTER, I see the floating IP has now been assigned to the BACKUP.

# ip addr show eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000
    link/ether 00:50:56:bd:7a:5f brd ff:ff:ff:ff:ff:ff
    inet 10.1.1.20/24 brd 10.1.1.255 scope global eth0
    inet 10.1.1.30/32 scope global eth0

– If you keep tcpdump running you should see that the BACKUP host is now sending out the VRRP advertisments. Host h2.example.com is now the master.

15:49:45.953584 IP h2.example.com > vrrp.mcast.net: VRRPv2, Advertisement, vrid 51, prio 100, authtype simple, intvl 1s, length 20
15:49:46.953889 IP h2.example.com > vrrp.mcast.net: VRRPv2, Advertisement, vrid 51, prio 100, authtype simple, intvl 1s, length 20
15:49:47.954202 IP h2.example.com > vrrp.mcast.net: VRRPv2, Advertisement, vrid 51, prio 100, authtype simple, intvl 1s, length 20
15:49:48.954519 IP h2.example.com > vrrp.mcast.net: VRRPv2, Advertisement, vrid 51, prio 100, authtype simple, intvl 1s, length 20

– In /var/log/messages on the BACKUP HAPRoxy host, you should see it taking over the floating IP. Host h2.example.com is the master now.

Mar  8 15:49:44 h2.example.com Keepalived_vrrp[4652]: VRRP_Instance(VI_1) Transition to MASTER STATE
Mar  8 15:49:45 h2.example.com Keepalived_vrrp[4652]: VRRP_Instance(VI_1) Entering MASTER STATE
Mar  8 15:49:45 h2.example.com Keepalived_vrrp[4652]: VRRP_Instance(VI_1) setting protocol VIPs.
Mar  8 15:49:45 h2.example.com Keepalived_vrrp[4652]: VRRP_Instance(VI_1) Sending gratuitous ARPs on eth0 for 10.1.10.30
Mar  8 15:49:45 h2.example.com Keepalived_healthcheckers[4651]: Netlink reflector reports IP 10.1.10.30 added
Mar  8 15:49:46 h2.example.com ntpd[2974]: Listen normally on 4 eth0 10.1.10.30 UDP 123
Mar  8 15:49:46 h2.example.com ntpd[2974]: peers refreshed
Mar  8 15:49:50 h2.example.com Keepalived_vrrp[4652]: VRRP_Instance(VI_1) Sending gratuitous ARPs on eth0 for 10.1.10.30
Mar  8 15:49:52 h2.example.com kernel: device eth0 left promiscuous mode

– Since the test has worked, Re-enable keepalived on the MASTER host and watch in tcpdump as host h1.example.com is back to being the master.
You can re-enable keepalived using ‘service keepalived start’.

16:16:41.841635 IP h1.example.com > vrrp.mcast.net: VRRPv2, Advertisement, vrid 51, prio 100, authtype simple, intvl 1s, length 20
16:16:42.842722 IP h1.example.com > vrrp.mcast.net: VRRPv2, Advertisement, vrid 51, prio 100, authtype simple, intvl 1s, length 20
16:16:43.843847 IP h1.example.com > vrrp.mcast.net: VRRPv2, Advertisement, vrid 51, prio 100, authtype simple, intvl 1s, length 20
16:16:44.844982 IP h1.example.com > vrrp.mcast.net: VRRPv2, Advertisement, vrid 51, prio 100, authtype simple, intvl 1s, length 20

– You can also check in /var/log/messages when host h1.example.com is the master.

Mar  8 15:50:18 h1.example.com Keepalived_vrrp[4324]: Kernel is reporting: interface eth0 UP
Mar  8 15:50:18 h1.example.com Keepalived_vrrp[4324]: VRRP_Instance(VI_1) Transition to MASTER STATE
Mar  8 15:50:19 h1.example.com Keepalived_vrrp[4324]: VRRP_Instance(VI_1) Entering MASTER STATE
Mar  8 15:50:19 h1.example.com Keepalived_vrrp[4324]: VRRP_Instance(VI_1) setting protocol VIPs.
Mar  8 15:50:19 h1.example.com Keepalived_healthcheckers[4323]: Netlink reflector reports IP 10.1.10.30 added
Mar  8 15:50:19 h1.example.com Keepalived_vrrp[4324]: VRRP_Instance(VI_1) Sending gratuitous ARPs on eth0 for 10.1.10.30
Mar  8 15:50:21 h1.example.com Keepalived_vrrp[4324]: Netlink reflector reports IP 10.1.10.10 added
    

– Once you return H1 host to being the MASTER by starting keepalived, in /var/log/messages on H2 host you will see it giving up the floating IP.
After I bring h1.example.com back online, h2.example.com becomes the backup.

Mar  8 15:50:18 h2.example.com Keepalived_vrrp[4652]: VRRP_Instance(VI_1) Received higher prio advert
Mar  8 15:50:18 h2.example.com Keepalived_vrrp[4652]: VRRP_Instance(VI_1) Entering BACKUP STATE
Mar  8 15:50:18 h2.example.com Keepalived_vrrp[4652]: VRRP_Instance(VI_1) removing protocol VIPs.
Mar  8 15:50:18 h2.example.com Keepalived_healthcheckers[4651]: Netlink reflector reports IP 10.1.10.30 removed

– You can further verify your configuration by running a ping during the above exercise. Ping the floating IP from another host.
There is minimal packet loss as both MASTER, and the BACKUP take over each other’s services as appropriate.

$ ping 10.1.10.30
PING 10.1.10.30 (10.1.10.30): 56 data bytes
64 bytes from 10.1.10.30: icmp_seq=0 ttl=61 time=90.486 ms
64 bytes from 10.1.10.30: icmp_seq=1 ttl=61 time=89.514 ms
64 bytes from 10.1.10.30: icmp_seq=2 ttl=61 time=87.989 ms
64 bytes from 10.1.10.30: icmp_seq=3 ttl=61 time=98.162 ms
64 bytes from 10.1.10.30: icmp_seq=4 ttl=61 time=87.107 ms
64 bytes from 10.1.10.30: icmp_seq=5 ttl=61 time=89.163 ms
64 bytes from 10.1.10.30: icmp_seq=6 ttl=61 time=88.792 ms
64 bytes from 10.1.10.30: icmp_seq=7 ttl=61 time=89.156 ms
Request timeout for icmp_seq 8                                <------ At this point I stopped the keepalived on MASTER
Request timeout for icmp_seq 9
64 bytes from 10.1.10.30: icmp_seq=10 ttl=61 time=88.386 ms   <------ BACKUP has now started to respond to ping for floating IP
64 bytes from 10.1.10.30: icmp_seq=11 ttl=61 time=91.164 ms
64 bytes from 10.1.10.30: icmp_seq=12 ttl=61 time=88.215 ms
64 bytes from 10.1.10.30: icmp_seq=13 ttl=61 time=88.457 ms
64 bytes from 10.1.10.30: icmp_seq=14 ttl=61 time=87.170 ms
64 bytes from 10.1.10.30: icmp_seq=15 ttl=61 time=120.544 ms
64 bytes from 10.1.10.30: icmp_seq=16 ttl=61 time=91.861 ms
Request timeout for icmp_seq 17                               <------ I restarted keepalived on MASTER
64 bytes from 10.1.10.30: icmp_seq=18 ttl=61 time=89.658 ms
64 bytes from 10.1.10.30: icmp_seq=19 ttl=61 time=90.201 ms
64 bytes from 10.1.10.30: icmp_seq=20 ttl=61 time=88.008 ms
64 bytes from 10.1.10.30: icmp_seq=21 ttl=61 time=88.369 ms
^C
--- 10.1.10.30 ping statistics ---
22 packets transmitted, 19 packets received, 13.6% packet loss
round-trip min/avg/max/stddev = 87.107/91.179/120.544/7.315 ms

The above is a simple example of using IP based monitoring. You can also do application based monitoring. In order to do this, we will modify our config file on both master and slave to include a check which will check the status of the HAProxy service. If it is running, then it will continue to serve the floating IP through the MASTER. If the service stops, then the BACKUP will resume ownership. This level of monitoring is in addition to monitoring the network interface being up.

! Configuration File for keepalived

global_defs {
   notification_email {
     admin@example.com
   }
   notification_email_from keepalived@example.com
   smtp_server 10.1.1.100
   smtp_connect_timeout 30
   router_id LVS_DEVEL
}

vrrp_script check_haproxy {
    script    "/sbin/service haproxy status"
    interval 2
    fall 2
    rise 2
}

vrrp_instance VI_1 {
    state MASTER
    interface eth0
    virtual_router_id 51
    priority 101
    advert_int 1
    authentication {
        auth_type PASS
        auth_pass 1111
    }
    virtual_ipaddress {
        10.1.1.30
    }
    track_script {
        check_haproxy
    }
}

Notice in the above we added two additinal sections. One is ‘vrrp_script check_haproxy’. This code will run haproxy status on the MASTER. If the return code is ‘0’ then the service is considered to be up. If the return code is other than ‘0’ then the service is considered to be down and the BACKUP host will then take-over the floating IP.

– Backup server config for application monitoring. Similar to the MASTER.

! Configuration File for keepalived

global_defs {
   notification_email {
     admin@example.com
   }
   notification_email_from keepalived@example.com
   smtp_server 10.1.1.100
   smtp_connect_timeout 30
   router_id LVS_DEVEL
}

vrrp_script check_haproxy {
    script    "/sbin/service haproxy status"
    interval 2
    fall 2
    rise 2
}


vrrp_instance VI_1 {
    state BACKUP
    interface eth0
    virtual_router_id 51
    priority 100
    advert_int 1
    authentication {
        auth_type PASS
        auth_pass 1111
    }
    virtual_ipaddress {
        10.1.1.30
    }
    track_script {
        check_haproxy
    }
    
}

How do you use keepalived and HAPRoxy in your network? Share your comments.

%d bloggers like this: