High Availability using Keepalived

HAProxy (http://haproxy.1wt.eu/) is a popular solution for load balancing servers. In a typical load balanced configuration, there may be a number of web servers behind a pair of HAProxy load balancers. The question arises, how do you load balance the HAProxy servers themselves? One way is to use ‘keepalived’ (http://www.keepalived.org/). In order to use this solution, you need at least two HAProxy servers. On both of them install keepalived as explained below. Both servers will have a floating IP which you can create a DNS record for and give that name to your clients. For instance http://www.example.com may have IP 10.1.1.30 which is the floating IP between the two HAProxy servers. Clients will attempt to connect to 10.1.1.30. Depending on which HAProxy server is the master, the IP will be owned by that server. If that server fails, then the backup server will start to issue gratuitous ARP responses for the same IP of 10.1.1.30 and the requests to the web servers will then go through the backup HAProxy server which has now become the primary.

– Install keepalived

$sudo yum install keepalived -y

– Setup two hosts with the following IP address. The floating IP address will be assigned to the virtual router instance in the config later.

10.1.1.10 is h1.example.com (HAProxy server 1)
10.1.1.20 is h2.example.com (HAProxy server 2)
10.1.1.30 is floating IP (shared between the two server)
10.1.1.100 is SMTP server

– Sample basic config file for master. Note the use of ‘state MASTER’ and also ‘priority 101’

! Configuration File for keepalived

global_defs {
   notification_email {
     admin@example.com
   }
   notification_email_from keepalived@example.com
   smtp_server 10.1.1.100
   smtp_connect_timeout 30
   router_id LVS_DEVEL
}

vrrp_instance VI_1 {
    state MASTER
    interface eth0
    virtual_router_id 51
    priority 101
    advert_int 1
    authentication {
        auth_type PASS
        auth_pass 1111
    }
    virtual_ipaddress {
        10.1.1.30
    }
}

– Setup backup server. Sample basic config file for backup is below. Note the use of ‘state BACKUP’ and also ‘priority 100’. Priority should be lower on backup.

! Configuration File for keepalived

global_defs {
   notification_email {
     admin@example.com
   }
   notification_email_from keepalived@example.com
   smtp_server 10.1.1.100
   smtp_connect_timeout 30
   router_id LVS_DEVEL
}

vrrp_instance VI_1 {
    state BACKUP
    interface eth0
    virtual_router_id 51
    priority 100
    advert_int 1
    authentication {
        auth_type PASS
        auth_pass 1111
    }
    virtual_ipaddress {
        10.1.1.30
    }
}

– Start keepalived on both master and backup.

$sudo service keepalived start

– Verify that keepalived is running. You can do this by checking the IP address on the MASTER.

On 10.1.1.10, MASTER, floating IP is assigned to eth0 when MASTER is up.

ip addr show eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000
    link/ether 00:50:56:bd:4c:c7 brd ff:ff:ff:ff:ff:ff
    inet 10.1.1.10/24 brd 10.1.1.255 scope global eth0
    inet 10.1.1.30/32 scope global eth0

As you can see the 10.1.1.30 IP is with the master. Check the IP of the BACKUP HAProxy host.
On BACKUP, we have only the BACKUP IP, and not the floating IP.

# ip addr show eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000
    link/ether 00:50:56:bd:7a:5f brd ff:ff:ff:ff:ff:ff
    inet 10.1.1.20/24 brd 10.1.1.255 scope global eth0

– You can also use tcpdump and verify that the master is sending VRRP advertisments.
VRRP uses Multicast to keep track of state, you can view multicast traffic using tcpdump as shown below.

#tcpdump net 224.0.0.0/4

Host h1.example.com is the master.

15:49:38.342468 IP h1.example.com > vrrp.mcast.net: VRRPv2, Advertisement, vrid 51, prio 100, authtype simple, intvl 1s, length 20
15:49:39.342767 IP h1.example.com > vrrp.mcast.net: VRRPv2, Advertisement, vrid 51, prio 100, authtype simple, intvl 1s, length 20
15:49:40.343062 IP h1.example.com > vrrp.mcast.net: VRRPv2, Advertisement, vrid 51, prio 100, authtype simple, intvl 1s, length 20
15:49:41.343371 IP h1.example.com > vrrp.mcast.net: VRRPv2, Advertisement, vrid 51, prio 100, authtype simple, intvl 1s, length 20

– To test the failover, turn off keepalived on the master using ‘service keepalived stop’.
Once I stop keepalived on the MASTER, I see the floating IP has now been assigned to the BACKUP.

# ip addr show eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000
    link/ether 00:50:56:bd:7a:5f brd ff:ff:ff:ff:ff:ff
    inet 10.1.1.20/24 brd 10.1.1.255 scope global eth0
    inet 10.1.1.30/32 scope global eth0

– If you keep tcpdump running you should see that the BACKUP host is now sending out the VRRP advertisments. Host h2.example.com is now the master.

15:49:45.953584 IP h2.example.com > vrrp.mcast.net: VRRPv2, Advertisement, vrid 51, prio 100, authtype simple, intvl 1s, length 20
15:49:46.953889 IP h2.example.com > vrrp.mcast.net: VRRPv2, Advertisement, vrid 51, prio 100, authtype simple, intvl 1s, length 20
15:49:47.954202 IP h2.example.com > vrrp.mcast.net: VRRPv2, Advertisement, vrid 51, prio 100, authtype simple, intvl 1s, length 20
15:49:48.954519 IP h2.example.com > vrrp.mcast.net: VRRPv2, Advertisement, vrid 51, prio 100, authtype simple, intvl 1s, length 20

– In /var/log/messages on the BACKUP HAPRoxy host, you should see it taking over the floating IP. Host h2.example.com is the master now.

Mar  8 15:49:44 h2.example.com Keepalived_vrrp[4652]: VRRP_Instance(VI_1) Transition to MASTER STATE
Mar  8 15:49:45 h2.example.com Keepalived_vrrp[4652]: VRRP_Instance(VI_1) Entering MASTER STATE
Mar  8 15:49:45 h2.example.com Keepalived_vrrp[4652]: VRRP_Instance(VI_1) setting protocol VIPs.
Mar  8 15:49:45 h2.example.com Keepalived_vrrp[4652]: VRRP_Instance(VI_1) Sending gratuitous ARPs on eth0 for 10.1.10.30
Mar  8 15:49:45 h2.example.com Keepalived_healthcheckers[4651]: Netlink reflector reports IP 10.1.10.30 added
Mar  8 15:49:46 h2.example.com ntpd[2974]: Listen normally on 4 eth0 10.1.10.30 UDP 123
Mar  8 15:49:46 h2.example.com ntpd[2974]: peers refreshed
Mar  8 15:49:50 h2.example.com Keepalived_vrrp[4652]: VRRP_Instance(VI_1) Sending gratuitous ARPs on eth0 for 10.1.10.30
Mar  8 15:49:52 h2.example.com kernel: device eth0 left promiscuous mode

– Since the test has worked, Re-enable keepalived on the MASTER host and watch in tcpdump as host h1.example.com is back to being the master.
You can re-enable keepalived using ‘service keepalived start’.

16:16:41.841635 IP h1.example.com > vrrp.mcast.net: VRRPv2, Advertisement, vrid 51, prio 100, authtype simple, intvl 1s, length 20
16:16:42.842722 IP h1.example.com > vrrp.mcast.net: VRRPv2, Advertisement, vrid 51, prio 100, authtype simple, intvl 1s, length 20
16:16:43.843847 IP h1.example.com > vrrp.mcast.net: VRRPv2, Advertisement, vrid 51, prio 100, authtype simple, intvl 1s, length 20
16:16:44.844982 IP h1.example.com > vrrp.mcast.net: VRRPv2, Advertisement, vrid 51, prio 100, authtype simple, intvl 1s, length 20

– You can also check in /var/log/messages when host h1.example.com is the master.

Mar  8 15:50:18 h1.example.com Keepalived_vrrp[4324]: Kernel is reporting: interface eth0 UP
Mar  8 15:50:18 h1.example.com Keepalived_vrrp[4324]: VRRP_Instance(VI_1) Transition to MASTER STATE
Mar  8 15:50:19 h1.example.com Keepalived_vrrp[4324]: VRRP_Instance(VI_1) Entering MASTER STATE
Mar  8 15:50:19 h1.example.com Keepalived_vrrp[4324]: VRRP_Instance(VI_1) setting protocol VIPs.
Mar  8 15:50:19 h1.example.com Keepalived_healthcheckers[4323]: Netlink reflector reports IP 10.1.10.30 added
Mar  8 15:50:19 h1.example.com Keepalived_vrrp[4324]: VRRP_Instance(VI_1) Sending gratuitous ARPs on eth0 for 10.1.10.30
Mar  8 15:50:21 h1.example.com Keepalived_vrrp[4324]: Netlink reflector reports IP 10.1.10.10 added
    

– Once you return H1 host to being the MASTER by starting keepalived, in /var/log/messages on H2 host you will see it giving up the floating IP.
After I bring h1.example.com back online, h2.example.com becomes the backup.

Mar  8 15:50:18 h2.example.com Keepalived_vrrp[4652]: VRRP_Instance(VI_1) Received higher prio advert
Mar  8 15:50:18 h2.example.com Keepalived_vrrp[4652]: VRRP_Instance(VI_1) Entering BACKUP STATE
Mar  8 15:50:18 h2.example.com Keepalived_vrrp[4652]: VRRP_Instance(VI_1) removing protocol VIPs.
Mar  8 15:50:18 h2.example.com Keepalived_healthcheckers[4651]: Netlink reflector reports IP 10.1.10.30 removed

– You can further verify your configuration by running a ping during the above exercise. Ping the floating IP from another host.
There is minimal packet loss as both MASTER, and the BACKUP take over each other’s services as appropriate.

$ ping 10.1.10.30
PING 10.1.10.30 (10.1.10.30): 56 data bytes
64 bytes from 10.1.10.30: icmp_seq=0 ttl=61 time=90.486 ms
64 bytes from 10.1.10.30: icmp_seq=1 ttl=61 time=89.514 ms
64 bytes from 10.1.10.30: icmp_seq=2 ttl=61 time=87.989 ms
64 bytes from 10.1.10.30: icmp_seq=3 ttl=61 time=98.162 ms
64 bytes from 10.1.10.30: icmp_seq=4 ttl=61 time=87.107 ms
64 bytes from 10.1.10.30: icmp_seq=5 ttl=61 time=89.163 ms
64 bytes from 10.1.10.30: icmp_seq=6 ttl=61 time=88.792 ms
64 bytes from 10.1.10.30: icmp_seq=7 ttl=61 time=89.156 ms
Request timeout for icmp_seq 8                                <------ At this point I stopped the keepalived on MASTER
Request timeout for icmp_seq 9
64 bytes from 10.1.10.30: icmp_seq=10 ttl=61 time=88.386 ms   <------ BACKUP has now started to respond to ping for floating IP
64 bytes from 10.1.10.30: icmp_seq=11 ttl=61 time=91.164 ms
64 bytes from 10.1.10.30: icmp_seq=12 ttl=61 time=88.215 ms
64 bytes from 10.1.10.30: icmp_seq=13 ttl=61 time=88.457 ms
64 bytes from 10.1.10.30: icmp_seq=14 ttl=61 time=87.170 ms
64 bytes from 10.1.10.30: icmp_seq=15 ttl=61 time=120.544 ms
64 bytes from 10.1.10.30: icmp_seq=16 ttl=61 time=91.861 ms
Request timeout for icmp_seq 17                               <------ I restarted keepalived on MASTER
64 bytes from 10.1.10.30: icmp_seq=18 ttl=61 time=89.658 ms
64 bytes from 10.1.10.30: icmp_seq=19 ttl=61 time=90.201 ms
64 bytes from 10.1.10.30: icmp_seq=20 ttl=61 time=88.008 ms
64 bytes from 10.1.10.30: icmp_seq=21 ttl=61 time=88.369 ms
^C
--- 10.1.10.30 ping statistics ---
22 packets transmitted, 19 packets received, 13.6% packet loss
round-trip min/avg/max/stddev = 87.107/91.179/120.544/7.315 ms

The above is a simple example of using IP based monitoring. You can also do application based monitoring. In order to do this, we will modify our config file on both master and slave to include a check which will check the status of the HAProxy service. If it is running, then it will continue to serve the floating IP through the MASTER. If the service stops, then the BACKUP will resume ownership. This level of monitoring is in addition to monitoring the network interface being up.

! Configuration File for keepalived

global_defs {
   notification_email {
     admin@example.com
   }
   notification_email_from keepalived@example.com
   smtp_server 10.1.1.100
   smtp_connect_timeout 30
   router_id LVS_DEVEL
}

vrrp_script check_haproxy {
    script    "/sbin/service haproxy status"
    interval 2
    fall 2
    rise 2
}

vrrp_instance VI_1 {
    state MASTER
    interface eth0
    virtual_router_id 51
    priority 101
    advert_int 1
    authentication {
        auth_type PASS
        auth_pass 1111
    }
    virtual_ipaddress {
        10.1.1.30
    }
    track_script {
        check_haproxy
    }
}

Notice in the above we added two additinal sections. One is ‘vrrp_script check_haproxy’. This code will run haproxy status on the MASTER. If the return code is ‘0’ then the service is considered to be up. If the return code is other than ‘0’ then the service is considered to be down and the BACKUP host will then take-over the floating IP.

– Backup server config for application monitoring. Similar to the MASTER.

! Configuration File for keepalived

global_defs {
   notification_email {
     admin@example.com
   }
   notification_email_from keepalived@example.com
   smtp_server 10.1.1.100
   smtp_connect_timeout 30
   router_id LVS_DEVEL
}

vrrp_script check_haproxy {
    script    "/sbin/service haproxy status"
    interval 2
    fall 2
    rise 2
}


vrrp_instance VI_1 {
    state BACKUP
    interface eth0
    virtual_router_id 51
    priority 100
    advert_int 1
    authentication {
        auth_type PASS
        auth_pass 1111
    }
    virtual_ipaddress {
        10.1.1.30
    }
    track_script {
        check_haproxy
    }
    
}

How do you use keepalived and HAPRoxy in your network? Share your comments.